SQL Seeker: performance

Monday, January 31, 2022

Index Column Order

In this post I'm going to demonstrate one of the important factors of index design, the index column order. I'm going to be using the AdventureWorks2019 sample database and we'll take a look at the IX_Person_LastName_FirstName_MiddleName non-clustered index which is on the Person.Person table.

By looking at the index properties in Management Studio I can see the columns that make up the index and they have been put in the column order of LastName, FirstName and MiddleName. It's also worth noting the name of the index matches perfectly with the columns and their order which is a great example of a great naming convention!

So how does the column order affect our queries? Let's start with the following query:

SELECT FirstName, LastName FROM Person.Person

WHERE LastName = 'Stewart';

Here we're selecting the FirstName and LastName columns from the Person.Person table where the LastName is 'Stewart', which is commonly referred to as filtering the rows. If I run the query it returns 93 rows and if I look at the execution plan (or query plan as it is also known) I can see that the optimiser has used an index seek operator on the IX_Person_LastName_FirstName_MiddleName index:

Now we'll try another query and change the WHERE clause to return rows, 87 in this example, where the FirstName = 'David' (because it's a great name):

SELECT FirstName, LastName FROM Person.Person
WHERE FirstName = 'David';

Although FirstName is present in our index as the second column our query plan is a little bit different this time:

Although we can see the optimiser is still using our index the optimiser has decided to use an index scan operator instead this time. In order to understand why this is different we have to look into not the rows returned by the engine (in this case 99) but the number of rows the engine has had to read to return those rows:

In Plan Explorer I can see that the engine has had to read through 19972 rows to return 87 however our first query that utilised the index seek operator has only had to read through 93 rows (and returned the same number):

Its also worth noting that for our index scan query we also see a Reason for Early Termination: Good Enough Plan Found which means the optimiser has retrieved the execution plan from cache, we don't see this message for the plan using an index seek as the optimiser has used a Trivial Plan instead.

This shows how a column cannot be effectively used for filtering (using the where clause in our query) unless it is the first column in an index. In the second query the index has still been used by the optimiser but it is unable to perform a more efficient seek operation and instead has had to scan (or read) through the entire index to return our results.

Now we could create an additional non-clustered index that has the FirstName column as the first column but we also have to determine the suitability of that index. If we were to create indexes for every single query on our database we'd end up with a lot of surplus indexes that are barley used which will inevitably cause a lot of overhead for things like table update operations and index maintenance.

It's much more likely for queries to be filtering on LastName so that's a perfect candidate for a non-clustered index, if we're never going to use the FirstName column in where clauses, joins etc then an index using that column isn't goign to provide any benefit which is why understanding how the columns will be used in our queries is the most important factor in designing index strategies.

Wednesday, January 26, 2022

Query Simplification: Join Removal with Foreign Keys

Foreign keys are used in database design to enforce referential integrity but they also have some performance benefits as well that you might not necessarily notice unless you're looking into your execution plans.

Let's take the following query using the AdventureWorks2019 sample database where I'm selecting the BusinessEntityID and JobTitle from the HumanResources.Employee table and by using an inner join I'm only returning rows that have matching values (BusinessEntityID) in both tables:

SELECT e.BusinessEntityID, e.JobTitle

FROM HumanResources.Employee e

INNER JOIN Person.Person p

ON e.BusinessEntityID = p.BusinessEntityID;

I might expect to see an operator that performs the inner join in the execution plan but when I look at the plan this is what I get:

Plan 1: Index Scan on HumanResources.Empoyee

Despite my query containing a join to the Person.Person table the optimiser has used an execution plan that only contains a clustered index scan of the HumanResources.Employee table. Even though the join is there in our query text it hasn't been used in the plan at all.

The reason why the join isn't present in our first query is partly down to the columns which the query is returning. I'm only selecting two columns which are both from the employee table, if I were to add a column from the Person.Person table such as in the following query then I will get a very different execution plan:

SELECT e.BusinessEntityID, e.JobTitle, p.FirstName

FROM HumanResources.Employee e

INNER JOIN Person.Person p

ON e.BusinessEntityID = p.BusinessEntityID;

Plan 2: Query with column from Person.Person

By adding this column (or indeed any column from the Person.Person table or even worse if I had used SELECT *) the execution plan has two additional operators; an index seek on the Person.Person table and this time a join operator (Nested Loop) is present in the plan. This is also a great example of why only selecting columns that you actually need in a query is really important otherwise you might be causing needless overhead.

The column selection has clearly influenced the optimisers plan selection and when referencing columns in the Person.Person table in the select the optimiser has had to come up with a plan containing the join, but why for our first query has it not done the same thing?

In this example the join removal is made possible due to a foreign key relationship between the two tables on the BusinessEntityID column. Which if you're playing along in Management Studio you can see under the HumanResources.Employee table under keys (FK_Employee_Person_BusinessEntityID).

As our inner join is essentially returning matching values from both tables and the foreign key exists on those the values the optimiser already knows that the values are matched and in turn as we're not returning any values from the Person.Person table it can avoid the cost of the join and the scan operation on the Person.Person table altogether, but we'll still get the same results.

In order to test this I'm going to remove the foreign key with the following T-SQL and then re-run the first query:

ALTER TABLE [HumanResources].[Employee] DROP CONSTRAINT [FK_Employee_Person_BusinessEntityID]

SELECT e.BusinessEntityID, e.JobTitle

FROM HumanResources.Employee e

INNER JOIN Person.Person p

ON e.BusinessEntityID = p.BusinessEntityID;

This time although we're only selecting records from the employee table we get a very different query plan that is unable to take advantage of join removal and as such has to perform an index scan on both tables which are joined using the Hash Match operator.

Plan 3: Query with no foreign key

Before I carry on I'll add the foreign key back with the following T-SQL:

ALTER TABLE [HumanResources].[Employee] WITH CHECK ADD CONSTRAINT [FK_Employee_Person_BusinessEntityID] FOREIGN KEY([BusinessEntityID])

REFERENCES [Person].[Person] ([BusinessEntityID])

ALTER TABLE [HumanResources].[Employee] CHECK CONSTRAINT [FK_Employee_Person_BusinessEntityID]

This is a great example of how foreign keys not only force referential integrity but they do have some less obvious performance benefits too that the optimiser can take advantage during query optimisation phases.

SQL Seeker

Monday, January 31, 2022

Index Column Order

Wednesday, January 26, 2022

Query Simplification: Join Removal with Foreign Keys

Breaking up with SQL Server

Search This Blog