Schema - Do you use Natural or Surrogate primary keys?

Last updated by ChristianMWaite about 4 years ago.See history

Now, this is a controversial one. Which one do you use?

  1. A "Natural" (or "Intelligent") key is actual data

    • Surname, FirstName, DateOfBirth
  2. An "Acquired Surrogate" (or "Artificial" or "System Generated") key is NOT derived from data eg. Autonumber

    • eg. ClientID 1234
    • eg. ClientID JSKDYF
    • eg. ReceiptID 1234
  3. A "Derived Surrogate" (or "User Provided") key is indirectly derived from data eg. Autonumber

    • eg. ClientID SSW (for SSW)
    • eg. EmpID AJC (for Adam Jon Cogan)
    • eg. ProdID CA (for Code Auditor)
  4. A "GUID" key automatically generated by SQL Server

The problems with Natural Keys:

  • Because they have a business meaning, if that meaning changes (eg. they change their surname), then that value NEEDS to change. Changing a value with data is a little hard - but a lot easier with Cascade Update.
  • The main problem is that the key is large and combined and this needs to be used in all joins

The Problem with Acquired Surrogate Keys:

  • A surrogate key has no meaning to a user
  • It always requires a join when browsing a child table eg. The InvoiceDetail table

The Problem with Derived Surrogate

  • The user needs to enter a unique value
  • Because they have a business meaning, if that meaning changes (eg. they change their company name), then that value MAY NEED to change. Changing a value with data is a little hard - but a lot easier with Cascade Update
  • More likely to have a problem with Merge Replication

The Problem with GUID key

We like GUID keys. However, GUID generation produces essentially random numbers which cannot realistically be used as primary keys since new rows are inserted into the table at random positions leading to extremely slow inserts as the table grows to a moderate size. Inserting into the middle of a table with a clustered index, rather than appending to the end can potentially cause the database to have to move large portions of the data to accommodate space for the insert. This can be very slow.

Recommendations

  1. We do not use Natural keys ever
  2. We use Acquired Surrogate for some tables

    • eg. Invoice table
    • eg. Receipt table
  3. a combination of Acquired Surrogate and Derived Surrogate for other tables

    • eg. Customer table
    • eg. Employee table
    • eg. Product table

When we say combination because if the user doesn't enter a value then we put a random value in (by a middle-tier function, so it works with Access or SQL). eg. ClientID JSKDYF The user can then change the value to anything else and we validate it is not used, and then perform a cascade update - or if it is more then 3 levels deep we execute a stored proc. Unfortunately, this is a complicated proc that cycles through all related tables and performs an UPDATE. Here is an example.

The Derived Surrogate has the benefit of being easy for people to remember and can be used in the interface or even the query string

Over the years experience has lead me to the opinion that the natural vs surrogate key argument comes down to a style issue. If a client or employer has a standard one way or another, fine use it. If not, use whichever you method you prefer, recognizing that there may be some annoyances you face down the road. But don't let somebody criticize you because your style doesn't fit his preconceived notions.

We open source. Powered by GitHub