Wednesday, February 2, 2022

Duplicate Indexes - Which index does the Query Optimiser use?

It's well documented that duplicate indexes are literally a waste of space so I'm not going to blog about that, here's a post from Kevin Hill that covers what you need to know:

Duplicate Indexes Explained - DallasDBAs.com

Lets take the following query that I will run against the AdventureWorks2019 sample database:

SELECT FirstName, LastName, MiddleName
FROM Person.Person p
WHERE FirstName = 'Allison' AND LastName = 'Stewart'
OPTION (RECOMPILE);

Here's the plan:

As there is an index that has the LastName and FirstName columns the optimiser has opted for an index seek operator using the IX_Person_LastName_FirstName_MiddleName index, and if I look into the Plan XML I can see that it's using a trivial plan: StatementOptmLevel="TRIVIAL".

This basically means there's one obvious way to return the query results so the optimiser has avoided the cost of going through full optimisation and has elected to use this plan straightaway.

So what happens if I create an identical copy of that particular index, in fact let's create five indexes that are exactly the same:

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_1] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_2] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_3] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_4] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_5] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

As a side thought; wouldn't it be great if Clippy appeared and told me I already had an index with the same columns and that these weren't needed? It would be great, but SQL doesn't check for these things so the duplicate indexes get created straight with no fuss.

Now the first thing I wondered was if the optimiser would not use full optimisation because it has additional indexes to think about but no, looking at the Plan XML I see that a trivial plan has been used again: StatementOptmLevel="TRIVIAL". So what index has it used?


As we can see from the execution plan this time the optimiser has used the last index that I created; IX_Person_LastName_FirstName_MiddleName_5. Why? Well in all honesty I don't know, is it because the optimiser uses the very latest index by default. I'm not sure. But let's delete the index and see what happens:

DROP INDEX [IX_Person_LastName_FirstName_MiddleName_5] ON [Person].[Person]
GO

When I run the query it has again used the "latest" duplicate index:


I did wonder if it's in some way limited to trivial plans so I added a join to my query to give the optimiser something to think about and use full optimisation:

SELECT FirstName, LastName, MiddleName, b.BusinessEntityID 
FROM Person.Person p
INNER JOIN Person.BusinessEntity b ON p.BusinessEntityID = b.BusinessEntityID
WHERE FirstName = 'Allison' AND LastName = 'Stewart'
OPTION (RECOMPILE);

This time the Plan XML shows StatementOptmLevel="FULL" so the optimiser has had to do a bit of extra work, and if we look at the execution plan again we can see it's using the "latest" index again:


Now I'm going to have to do some more digging to see if the optimiser always uses the most recent or latest index, maybe it's default behaviour - if you know then please let me know!

And to clean things up I'll remove all those duplicate indexes before writing any more queries!

Breaking up with SQL Server

I was inspired to write this after reading a post from Dave Mason regarding breaking up with Big Tech companies. Yet again I haven't wr...