Do you dream to be a 'Schema Master' one day? If so you need to know what changes are low impact and what needs to be done with care. Take care when it involves existing data. Do you know what the hard ones are?
Let's look at examples of this increasing complexity (using The Mr Northwinds database) :
-
ALTER TABLE dbo.Employees
ADD Gender bit NOT NULL
GO
- Figure: Add a column (Easy)
-
ALTER TABLE dbo.Employees
DROP COLUMN TitleOfCourtesy
GO
- Figure: Delete a column (Easy)
-
EXECUTE sp_rename N'dbo.Employees.HireDate',
N'Tmp_StartDate_1', 'COLUMN'
GO
EXECUTE sp_rename N'dbo.Employees.Tmp_StartDate_1',
N'StartDate', 'COLUMN'
GO
- Figure: Rename a column (Medium)
-
CREATE TABLE dbo.Tmp_Employees
(
...
Gender char(2) NULL,
...
) ON [PRIMARY]
TEXTIMAGE_ON [PRIMARY]
...
IF EXISTS(SELECT * FROM dbo.Employees)
EXEC('INSERT INTO dbo.Tmp_Employees (..., Gender,...)
SELECT ...,Gender ,...
FROM dbo.Employees WITH (HOLDLOCK TABLOCKX)
')
...
GO
DROP TABLE dbo.Employees
GO
EXECUTE sp_rename N'dbo.Tmp_Employees', N'Employees', 'OBJECT'
GO
- Figure: Change data type (Hard) e.g. Bit to Integer. The above is abbreviated, see the full .SQL file
-
CREATE TABLE dbo.Tmp_Employees
(
...
Gender int NULL,
...
) ON [PRIMARY]
TEXTIMAGE_ON [PRIMARY]
...
IF EXISTS(SELECT * FROM dbo.Employees)
EXEC('INSERT INTO dbo.Tmp_Employees (..., Gender,...)
SELECT ...,CASE Gender WHEN ''F'' THEN ''0''
WHEN ''M'' THEN ''1''
WHEN ''NA'' THEN ''2''
WHEN ''U'' THEN ''3''
ELSE ''-1''
END AS Gender ,...
FROM dbo.Employees WITH (HOLDLOCK TABLOCKX)
')
...
GO
DROP TABLE dbo.Employees
GO
EXECUTE sp_rename N'dbo.Tmp_Employees', N'Employees', 'OBJECT'
GO
- Figure: Change data type (Very Hard) e.g. Text to Integer. Text to Integer and data conversion requires "Data Motion Scripts". The above is abbreviated, see the full .SQL file
And the point of know this. Well no tool out there, not Redgate's SQL Compare, not Microsoft's Data Dude, nor SSW's SQL Deploy will do this automagically for you. So you better understand that this stuff is delicate.
Every time a change is made to your product's SQL Server Database, script out the change. You can use SQL Management Studio or VS.NET (you can find old guys that still use Enterprise Manager or Query Analyzer), but every time you make changes you must save the change as a .sql script file so any alterations are scripted.
Everything you do on your database will be done at least three times ( once on development, once test and once on production). Change control is one of the most important processes to ensuring a stable database system.
Keep the scripts in a separate directory with only .sql files
eg. C:\Program Files\SSW Time PRO.NET\SQLScripts (32 bit)
or C:\Program Files (x86)\SSW Time PRO.NET\SQLScripts (64 bit)
Later on you will get these 7 benefits:
- When you have an error you can see exactly which script introduced it
- You don't have to use a compare tool like Red-Gate SQL Compare at the end of your development cycle
- Your application can automatically make schema changes
- The application can have a "Create" database button when installed for the first time
- The application can have an "Upgrade" button and work out itself if this new version needs scripts to be run
- The application can tell if it is an old version (as a newer version may have upgraded the schema), so you only use the latest clients
- The application can have a "Reconcile" feature that compares the current schema to what it should be
- Figure: A list of change SQL scripts, each file name is in the correct format
Is there a file naming convention to follow?
The script file naming convention should be XXXXX_ObjectType_ObjectName_ColumnName_Description_SchemaMasterInitials.sql
eg. 00089_Table_Employee_Gender_ChangeFromBitToChar_AC.sql
SQL Compare is a good tool to find out the differences between two databases. It can help you answer the question "Is your database the same as mine?".
Let's see what it is good at.
- Figure: You can use SQL Compare to make two databases the same
- Figure: SQL Compare clearly shows some tables are missing
So if you want to compare 2 databases SQL Compare (or Data Dudes Compare) is great tools. They even let you synchronize sweetly between these 2 databases. However, if you are doing this at the end of your release cycle, you have a problem. Your schema deployment process is broken.
What you should be doing is seeing your Schema Master each time you have a new .sql file. You do this during the development process, not at the end in the package and deployment process.
- Figure: Give your SQL scripts to 'Schema Master' who will, check them into TFS, then run them
Note: We have a tool called SQL Deploy to help with automatic deployment.
You have a web site master right? This is the central point of contact if the site goes down.
When developing an application, all members can code. However schema changes being done by many developers often can lead to trouble.
Who is "Schema Master"? What does he do?
- Figure: One person should be the 'Schema Master', on an average sized project (of 5-10 devs)
If your project has a database, you need to select a "Schema Master". This is the one person who is allowed to make modifications to the database. These include:
The "Schema Master" in a development shop is often the lead programmer on the team. He is in charge of all database changes and scripting them out and checking them into source control.
- Figure: The Applications Database stores version info in a table called _zsVersion
- Figure: Only a "Schema Master" checks in the .sql files
Having many people in a company that are able to make schema changes, can only lead to big problems. This gets worse if the application is powerful (eg. enabled with SSW SQL Deploy that can make schema changes itself) can make schema changes.
Let's see how to fix the issue:
To avoid this problem, only one person (the "Schema Master") should have permissions to upgrade the database.
- Figure: The db_owner role is granted for one person only – the "Schema Master"
- Figure: And here is the "Schema Master" at SSW
You get an error message reported from a user like:
When I click the Save button on the product form it gives an error message about a missing field.
- Figure: The developer thinks "what could be wrong"
- Figure: The developer tests then replies "Works on my machine"
You try and reproduce it on your version in the office and everything works perfectly.
You suspect that the customer probably has changed the schema. So you start drafting an email to the user like:
Mary, I need you to send me your database schema as it might be different from what it should be. Can you:
- Open up Enterprise Manager in you are on SQL 2000 (or open SQL Management Studio if you are on SQL 2005, 2008 or 2010)
- Open the first tree
- Open the second tree
- Select your server
- Open that tree
- Select Databases
- Open that tree
- Select the database called Northwind
- Right click it and choose All Tasks, then Generate SQL Script
- Then select the options
- etc
- Then when I get this I will compare and I will make a script file for you to run and fix the problem
STOP! STOP! STOP!
It would be much better to just say:
Mary, click the "Reconcile" button and it will tell us what is wrong
Bottom line is the customers' database schema should always be correct, should be managed automatically by the app and if it is not, it is their problem.
Therefore, you should deliver an application with the buttons "Create", Upgrade" and "Reconcile", accessible via "Tools - Options" and a "Database" tab. We do this by using SSW SQL Deploy and throwing on the inherited user-control from the SSW.SQLDeploy.Options project.
For more information see Best Tools for SQL Server
It looks like this
Figure: When weird errors are happening at a client, you need a "Reconcile" button in your application. This compares the current scripts, to the client's database and tells you if things are not right Figure: First time your client opens the application, they will need to Creating a database. It should be as easy as clicking "Create"
As a developer, I promise to do these 3 things:
- Save every SQL change I do as a script
- Make sure the application I develop, has 3 buttons, "Create", "Update" and "Reconcile"
- Never ask a client to run a script
- Figure: Adam makes all his new developers swear in and repeat this
Application upgrade might not only include the .exe and .dll but the database changes. How to deploy these changes, manually or using tools to deploy?
Let's see the bad and good examples:
-
Dear Mr Northwind,
Before installing your application, you need to
run this script by
first opening up SQL Management Studio.
Open the attached script, point it to Northwind and
execute the script.
Let me know if you have any issues...
We worked very hard on this release.
I hope you’re happy with it.
Regards,
Eric Phan
- Figure: Bad example - run SQL scripts manually
-
Hi Mr. Northwind,
Please run the attached Northwind_v5.exe.
Click Run when the prompt appears.
Regards,
Eric Phan
- Figure: Better example - run SQL scripts using another package
-
Dear Mr Northwind,
When you run the Northwind v1.0 (Rich Client) it will
automatically upgrade the database for you.
Just make sure you have dbo permissions:
Let me know if you run into any issues,
otherwise have a great day.
Regards,
Eric Phan
- Figure: Best example - run SQL scripts in the application
- Figure: Deploy SQL scripts by the application itself
We have a tool called SQL Deploy can do this.
Controlled Lookup Data is when data is tightly coupled to the application. If the data is not there, you have problems. So how do we check to see if data is still there?
The simplest way is to add a procValidate (Stored Procedure) to check that all the lookup data is still there.
- Figure: procValidates are just like a nagging wife
Let's look at an example, of a combo that is populated with Controlled Lookup data (just 4 records)
- Figure: How do I make sure these 4 records never go missing?
-
CREATE PROCEDURE procValidate_Region
AS
IF EXISTS(SELECT TOP 1 * FROM dbo.[Region]
WHERE RegionDescription = 'Eastern')
PRINT 'Eastern is there'
ELSE
RAISERROR(N'Lack of Eastern', 10, 1)
IF EXISTS(SELECT TOP 1 * FROM dbo.[Region]
WHERE RegionDescription = 'Western')
PRINT Western is there'
ELSE
RAISERROR(N'Lack of Western', 10, 1)
IF EXISTS(SELECT TOP 1 * FROM dbo.[Region]
WHERE RegionDescription = 'Northern')
PRINT 'Northern is there'
ELSE
RAISERROR(N'Lack of Northern', 10, 1)
IF EXISTS(SELECT TOP 1 * FROM dbo.[Region]
WHERE RegionDescription = 'Southern')
PRINT 'Southern is there'
ELSE
RAISERROR(N'Lack of Southern', 10, 1)
- Figure: Implement a stored procedure to check the 'Controlled Lookup Data' does not go missing
Note: As this procedure will be executed many times, it must be Idempotent
Lookup data is data that you usually see in combo boxes. It may be a Customer Category, a Product Color or the Order Status. Usually this is defined by the user and the programmer does not care what or how many records they have. When the programmer relies on records being in the lookup table, it is called 'Controlled Lookup Data'.
So whenever you have special data, which is referenced in code you need to tread carefully by:
1) First understanding that although most of the time there is a clear separation between data and schema, there is an exception for Controlled Lookup Data. This is when data (aka Controlled Lookup Data) is tightly coupled to the application, meaning that you have an application that cannot function correctly without that data.
2) You need to deploy that 'Controlled Lookup Data'
3) You then need to add a check for it so that it does not disappear.
Let's look at an example:
- Figure: This combo looks innocent. However if it is "Billable" then the calendar goes yellow
- Figure: Billable days are shown in yellow
-
if (drDay.NotBillableCount == 0 &&
drDay.BillableCount > 0)
{
//Yellow Background
cell.BackColor = Color.FromArgb(255, 255, 140);
cell.BackColor2 = Color.FromArgb(255, 255, 140);
}
else if (drDay.BillableCount > 0)
{
cell.BackColor = Color.FromArgb(255, 255, 140);
cell.BackColor2 = Color.LightGray;
}
else
{
cell.BackColor = Color.LightGray;
cell.BackColor2 = Color.LightGray;
}
- Figure: I think we have "Controlled Lookup Data" here, because if the "BillableCount" is greater than 0, the color shown will be yellow
-
INSERT INTO dbo.[EmpTimeBillable]
([CategoryID], [CategoryName], [DateCreated],
[DateUpdated], [EmpUpdated], [Note], [rowguid],
[Colour])
VALUES
('ALL', '', '09/13/2009 00:00:00',
'09/13/2009 00:00:00',
'SSW-AdamCogan',
'Used for reports -
Excluded in Timesheets and Tasklist data entry',
'{A9A009A9-4E19-4FD3-B86A-B9260067D0EF}',
'White')
GO
INSERT INTO dbo.[EmpTimeBillable]
([CategoryID], [CategoryName], [DateCreated],
[DateUpdated],[EmpUpdated], [Note], [rowguid],
[Colour])
VALUES
('B', 'Billable', '07/01/2009 00:00:00',
'07/01/2009 00:00:00',
'SSW-AdamCogan',
'DON’T CHANGE - These are hard coded',
'{F410C25D-1F1A-4340-B7A4-7A4AAE037708}',
'Yellow')
GO
INSERT INTO dbo.[EmpTimeBillable]
([CategoryID], [CategoryName], [DateCreated],
[DateUpdated], [EmpUpdated], [Note], [rowguid],
[Colour])
VALUES
('BPP', 'Prepaid Billable', '02/28/2009 15:30:19',
'02/28/2009 00:00:00',
'SSW-AdamCogan',
'DON’T CHANGE - These are hard coded',
'{608AA6FF-B3C5-47BE-AC9A-29553E89643D}',
'LightYellow')
GO
INSERT INTO dbo.[EmpTimeBillable]
([CategoryID], [CategoryName], [DateCreated],
[DateUpdated], [EmpUpdated], [Note], [rowguid],
[Colour])
VALUES
('U', 'Unknown', '07/01/2009 00:00:00',
'07/01/2009 00:00:00',
'SSW-AdamCogan',
'DON’T CHANGE - These are hard coded',
'{74937D60-D2B2-4A4D-96AD-7F5B1941B244}',
'White')
GO
INSERT INTO dbo.[EmpTimeBillable]
([CategoryID], [CategoryName], [DateCreated],
[DateUpdated], [EmpUpdated], [Note], [rowguid],
[Colour])
VALUES
('W', 'W/Off', '07/01/2009 00:00:00',
'07/01/2009 00:00:00',
'SSW-AdamCogan',
'DON’T CHANGE - These are hard coded',
'{D51513CE-8A1D-41E4-93C4-3E827FF7522B}',
'LavenderBlue')
GO
- Figure: This data must be deployed, just like we deploy schema
Now you need to add a procValidate, see Do you check your "Controlled Lookup Data" (aka Reference Data) is still there with procValidate?
Scripting out a schema change is easy, worrying about data is not. "'Data motion" refers to a change in the meaning of data, which will require scripts which touch data and schema.
Let's look at an example:
We have a 'Gender' column (that is a Boolean) storing 0's and 1's. All works well for a while.
- Figure: Anything wrong this Gender column?
Later you learn you need to change the data type to char(2) to support 'M', 'F', 'T', 'NA' and 'U'
- Figure: Caster Semenya has taught us a thing or two about the right data type for Gender
The data then must be migrated to the new data type this way:
- Rename 'Gender' to 'ztGender' *
- Add a new column 'Gender' with type char(2)
- Insert the existing data from 'ztGender' to 'Gender' (map 0 to 'F' and 1 to 'M')
- Delete the column ztGender*
*Note: zt stands for Temporary
- Figure: Changing the data type and data required a "Data Motion Script"
Visual Studio does not automatically support this scenario, as data type changes are not part of the refactoring tools. However, if you add pre and post scripting events to handle the data type change the rest of the changes are automatically handled for you.
Good Example - Don't use Data Dude
note: In order to achieve this you MUST use the built in Refactor tools as it create a log of all the refactors in order. This helps Visual Studio generate the schema compare and make sure no data is lost.
90% of the databases that SSW works with make use of denormalized fields. We believe this is with good reason. However, several precautions should be taken to ensure that the data held within these fields is reliable. This is particularly the case several applications are updating your denormalized data. To illustrate, let's say that we want to show all Customers with a calculated field totalling their order amount (ie Customer.OrderTotal).
With this example in mind, the main reasons we use denormalized fields are:
Reducing development complexity A denormalized field can mean that all SELECT queries in the database are simpler. Power users find it easier to use for reporting purposes - without the need for a cube. In our example, we would not need a large view to retrieve the data (as below).
SELECT Customer.CustomerID,
SUM (SalesOrderDetail.OrderQty *
(SalesOrderDetail.UnitPrice -
SalesOrderDetail.UnitPriceDiscount)
) AS DetailTotal,
Customer.SalesPersonID, Customer.TerritoryID,
Customer.AccountNumber,
Customer.CustomerType,
Customer.ModifiedDate, Customer.rowguid
FROM Customer INNER JOIN SalesOrderHeader
ON Customer.CustomerID = SalesOrderHeader.CustomerID
INNER JOIN SalesOrderDetail
ON SalesOrderHeader.SalesOrderID =
SalesOrderDetail.SalesOrderID
GROUP BY Customer.CustomerID, Customer.SalesPersonID,
Customer.TerritoryID, Customer.AccountNumber,
Customer.CustomerType,
Customer.ModifiedDate,Customer.rowguid
ORDER BY Customer.CustomerID
- Figure: A view to get customer totals when no denormalized fields are used
If we had a denormalized field, the user or developer would simply have run the following query: SELECT Customer.CustomerID, Customer.OrderTotal AS DetailTotal
FROM Customer
ORDER BY Customer.CustomerID
- Figure: Queries are much simpler with denormalized fields
Note that this is not a particularly complicated example. However, you can see why it can simplify development greatly when working with a large number of tables
Performance is better for read-intensive reports Particularly when reporting on data with a cube.
When there a multiple tables in a SQL Server view They cannot be updated in one hit - they must be updated one table at a time.
It is a built-in validation device For example, if records are accidentally deleted directly in the database, there is still a validation check for the correct totals. The value of this is mitigated when there is a full audit log on the database
However, there are reasons against using denormalized fields
They have to be maintained and can potentially get out of synch This can makes them unreliable - particularly if several applications are incorrectly updating the denormalized fields. UPDATE, INSERT, DELETEs are more complicated as they have to update the denormalized fields
They can be seen as an unnecessary waste of space All in all, we choose to still use denormalized fields because they can save development time. We do this with some provisos. In particular, they must be validated correctly to ensure the integrity of the data.
Here is how we ensure that this data is validated:
- Change the description on any denormalized fields to include "Denormalized" in the description - "Denormalized: Sum(OrderTotal) FROM Orders" in description in Enterprise Manager
- Create a view that lists all the denormalized fields in the database - based on the description field.
CREATE VIEW dbo.vwValidateDenormalizedFields
AS
SELECT OBJECT_NAME(id) AS TableName,
COL_NAME(id, smallid) AS ColumnName,
CAST([value] AS VARCHAR(8000)) AS Description,
'procValidate_' + OBJECT_NAME(id) +
'_' + COL_NAME(id, smallid) as
ValidationProcedureName
FROM dbo.sysproperties
WHERE (name = 'MS_Description') AND
(CAST([value] AS VARCHAR(8000))
LIKE '%Denormalized:%')
- Figure: Standard view for validation of a denormalized field
- Create a stored procedure (based on the above view) that validates whether all denormalized fields have a stored procedure that validates the data within them
CREATE PROCEDURE procValidateDenormalizedFieldValidators
AS
SELECT
ValidationProcedureName AS
MissingValidationProcedureName
FROM vwValidateDenormalizedFields
WHERE ValidationProcedureName NOT IN
(
SELECT ValidationProcedureName
FROM vwValidateDenormalizedFields AS vw
LEFT JOIN sysobjects
ON
vw.ValidationProcedureName =
OBJECT_NAME(sysobjects.id)
WHERE id IS NOT NULL
)
- Figure: Standard stored procedure for validation of a denormalized field
If you want to know how to implement denormalized fields, see our rules Do you use triggers for denormalized fields?
Many developers worry about Idempotency. They make sure that their scripts can run multiple times without it affecting the database, upon subsequent running of the script.
This usually involves a check at the start to see if the object exists or not.
eg. If this table exists, then don't create the table.
Seems popular, seems like a good idea, right? Wrong! And here is why.
Database scripts should be run in order (into separate sequential files), as per the rule Do you script out all changes?
Therefore developers should not worry about idempotency, as the script will run in the order it was created. Actually, if they are doing this, then *they want to see the errors*. It means that the database is not in the state that they expect.
IF EXISTS (SELECT 1 FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_TYPE='BASE TABLE' AND
TABLE_NAME='Employees'
)
ALTER TABLE [dbo].[Employees]( …… ) ON [PRIMARY]
ELSE
CREATE TABLE [dbo].[Employees]( …… ) ON [PRIMARY]
Bad example – worrying about the idempotency should not be done, if you plan to run your scripts in the order they were created
CREATE TABLE [dbo].[Employees](
……
) ON [PRIMARY]
Good example – not worrying about the idempotency. If errors occur we don’t want them to be hidden + it is easier to read
- Figure: Viagra isn't the cure to your Idempotency problems
See the concept of Idempotence on WikiPedia
|