I ran into an issue where I needed a unique index on a column that should allow multiple NULL records. Oracle supports this natively in its unique index implementation, but MS SQL does not. Fear not! I came up with a slick solution to this problem.
In MS SQL Server (2000-2012) you can create a unique index on a deterministic computed column! It took me a while to figure out how to implement this in a deterministic manner without being able to call handy functions like NEWID() or HASHBYTES.
You will need two new items in your table to make this work -
1) a column named [Guid] of type UNIQUEIDENTIFIER, with a default value of NEWID()
2) another column that you will create the unique index on, in my case I called it ReferenceIdUnique and it is a computed column.
Here is my table definition:
CREATE TABLE [dbo].[MyTable]( [MyPrimaryKeyId] [int] IDENTITY(1,1) NOT NULL, [Guid] [uniqueidentifier] NOT NULL, [ReferenceId] [int] NULL, [ReferenceIdUnique] AS ([dbo].[GenerateUniqueNULLKey]([ReferenceId],[Guid])), CONSTRAINT [PK_ProjectMaterialAssignments] PRIMARY KEY CLUSTERED ( [MyPrimaryKeyId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
And the Unique Index applied:
CREATE UNIQUE NONCLUSTERED INDEX [IX_MyTable_Unique] ON [dbo].[MyTable] ( [ReferenceIdUnique] ASC, )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
And finally, the function that makes it all possible:
-- ============================================= -- Author: Michael Brown - www.embrodesign.com -- Create date: Feb 8, 2013 -- Description: Generate a unique key for a nullable integer column -- ============================================= CREATE FUNCTION [dbo].[GenerateUniqueNULLKey] ( @ReferenceId INT, @Guid UNIQUEIDENTIFIER ) RETURNS UNIQUEIDENTIFIER WITH SCHEMABINDING AS BEGIN -- If the ReferenceId is NULL, return the unique identifier that was passed in -- If the ReferenceId has a value, return an MD5 hash of the value and cast that to a uniqueidentifier. -- The ISNULL check requires the comparison items to be of the same type, so cast both to a string before converting it to a uniqueidentifier. RETURN CONVERT(UNIQUEIDENTIFIER, CONVERT(VARBINARY(32), ISNULL(CONVERT(VARCHAR(32), @ReferenceId), CONVERT(VARCHAR(36), @Guid) )) ); END
How it works:
The function (note the WITH SCHEMABINDING to tell SQL the function is deterministic) generates a UNIQUEIDENTIFIER based on the values you pass to the function. If a NULL ReferenceId is passed in, it will return the unique Guid parameter that you passed into it. When the unique index checks the ReferenceIdUnique column NULL values will always have their own unique key that references the unique row so they appear unique to the index. However if the ReferenceId column is populated with a value, it will generate a UNIQUEIDENTIFIER based of the integer value and will also be unique. Using this method allows you to create unique indexes on composite columns that require allowing multiple NULL records along with other columns you need to include in your index – so it’s really flexible.
Improvements and things to avoid:
Why add a uniqueidentifier to your table instead of simply using your primary key? Because there may be value overlap since both fields are INT’s. For example, your primary key may have a value of 5, and the ReferenceId may have a value of 5, so when it is null the ReferenceId would equal 5 which may conflict with another valid record in your table. No good!
I tried alternatives such as using NEWID() in the function, or HASHBYTES() on the integer value but they are not deterministic so you cannot use them in this manner. Casting the integer to a VARBINARY is more efficient and it is possible then to use it to base your UNIQUEIDENTIFIER so it becomes deterministic. Of course, if anyone has any performance suggestions I’d love to hear them.
I also tried adding an INSERT trigger to check for duplicates, but this only seemed to work if the value was already in the table. If you were passing in a bulk insert (such as INSERT INTO myTable SELECT col1,col2,col3 FROM something) and it had duplicate records in it, the trigger would let them go through since it only checks the existence of the records currently in the destination table. A big drawback – but it will work for simple single inserts done into the table. The trigger isn’t the best way to deal with this anyways, both from a performance and design perspective.
What you didn’ know that? It’s Windows! You need to reboot after everything!
