Showing posts with label indexing. Show all posts
Showing posts with label indexing. Show all posts

Wednesday, February 2, 2022

Duplicate Indexes - Which index does the Query Optimiser use?

It's well documented that duplicate indexes are literally a waste of space so I'm not going to blog about that, here's a post from Kevin Hill that covers what you need to know:

Duplicate Indexes Explained - DallasDBAs.com

Lets take the following query that I will run against the AdventureWorks2019 sample database:

SELECT FirstName, LastName, MiddleName
FROM Person.Person p
WHERE FirstName = 'Allison' AND LastName = 'Stewart'
OPTION (RECOMPILE);

Here's the plan:

As there is an index that has the LastName and FirstName columns the optimiser has opted for an index seek operator using the IX_Person_LastName_FirstName_MiddleName index, and if I look into the Plan XML I can see that it's using a trivial plan: StatementOptmLevel="TRIVIAL".

This basically means there's one obvious way to return the query results so the optimiser has avoided the cost of going through full optimisation and has elected to use this plan straightaway.

So what happens if I create an identical copy of that particular index, in fact let's create five indexes that are exactly the same:

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_1] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_2] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_3] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_4] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName_MiddleName_5] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC,
[MiddleName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO

As a side thought; wouldn't it be great if Clippy appeared and told me I already had an index with the same columns and that these weren't needed? It would be great, but SQL doesn't check for these things so the duplicate indexes get created straight with no fuss.

Now the first thing I wondered was if the optimiser would not use full optimisation because it has additional indexes to think about but no, looking at the Plan XML I see that a trivial plan has been used again: StatementOptmLevel="TRIVIAL". So what index has it used?


As we can see from the execution plan this time the optimiser has used the last index that I created; IX_Person_LastName_FirstName_MiddleName_5. Why? Well in all honesty I don't know, is it because the optimiser uses the very latest index by default. I'm not sure. But let's delete the index and see what happens:

DROP INDEX [IX_Person_LastName_FirstName_MiddleName_5] ON [Person].[Person]
GO

When I run the query it has again used the "latest" duplicate index:


I did wonder if it's in some way limited to trivial plans so I added a join to my query to give the optimiser something to think about and use full optimisation:

SELECT FirstName, LastName, MiddleName, b.BusinessEntityID 
FROM Person.Person p
INNER JOIN Person.BusinessEntity b ON p.BusinessEntityID = b.BusinessEntityID
WHERE FirstName = 'Allison' AND LastName = 'Stewart'
OPTION (RECOMPILE);

This time the Plan XML shows StatementOptmLevel="FULL" so the optimiser has had to do a bit of extra work, and if we look at the execution plan again we can see it's using the "latest" index again:


Now I'm going to have to do some more digging to see if the optimiser always uses the most recent or latest index, maybe it's default behaviour - if you know then please let me know!

And to clean things up I'll remove all those duplicate indexes before writing any more queries!

Monday, January 31, 2022

Index Column Order

In this post I'm going to demonstrate one of the important factors of index design, the index column order. I'm going to be using the AdventureWorks2019 sample database and we'll take a look at the IX_Person_LastName_FirstName_MiddleName non-clustered index which is on the Person.Person table.

By looking at the index properties in Management Studio I can see the columns that make up the index and they have been put in the column order of LastName, FirstName and MiddleName. It's also worth noting the name of the index matches perfectly with the columns and their order which is a great example of a great naming convention!  


So how does the column order affect our queries? Let's start with the following query:

SELECT FirstName, LastName FROM Person.Person
WHERE LastName = 'Stewart';

Here we're selecting the FirstName and LastName columns from the Person.Person table where the LastName is 'Stewart', which is commonly referred to as filtering the rows. If I run the query it returns 93 rows and if I look at the execution plan (or query plan as it is also known) I can see that the optimiser has used an index seek operator on the IX_Person_LastName_FirstName_MiddleName index:

Now we'll try another query and change the WHERE clause to return rows, 87 in this example, where the FirstName = 'David' (because it's a great name):

SELECT FirstName, LastName FROM Person.Person
WHERE FirstName = 'David';

Although FirstName is present in our index as the second column our query plan is a little bit different this time:

Although we can see the optimiser is still using our index the optimiser has decided to use an index scan operator instead this time. In order to understand why this is different we have to look into not the rows returned by the engine (in this case 99) but the number of rows the engine has had to read to return those rows:

In Plan Explorer I can see that the engine has had to read through 19972 rows to return 87 however our first query that utilised the index seek operator has only had to read through 93 rows (and returned the same number):

Its also worth noting that for our index scan query we also see a Reason for Early Termination: Good Enough Plan Found which means the optimiser has retrieved the execution plan from cache, we don't see this message for the plan using an index seek as the optimiser has used a Trivial Plan instead.

This shows how a column cannot be effectively used for filtering (using the where clause in our query) unless it is the first column in an index. In the second query the index has still been used by the optimiser but it is unable to perform a more efficient seek operation and instead has had to scan (or read) through the entire index to return our results.

Now we could create an additional non-clustered index that has the FirstName column as the first column but we also have to determine the suitability of that index. If we were to create indexes for every single query on our database we'd end up with a lot of surplus indexes that are barley used which will inevitably cause a lot of overhead for things like table update operations and index maintenance. 

It's much more likely for queries to be filtering on LastName so that's a perfect candidate for a non-clustered index, if we're never going to use the FirstName column in where clauses, joins etc then an index using that column isn't goign to provide any benefit which is why understanding how the columns will be used in our queries is the most important factor in designing index strategies.

Breaking up with SQL Server

I was inspired to write this after reading a post from Dave Mason regarding breaking up with Big Tech companies. Yet again I haven't wr...