Wednesday, January 26, 2022

Query Simplification: Join Removal with Foreign Keys

Foreign keys are used in database design to enforce referential integrity but they also have some performance benefits as well that you might not necessarily notice unless you're looking into your execution plans.

Let's take the following query using the AdventureWorks2019 sample database where I'm selecting the BusinessEntityID and JobTitle from the HumanResources.Employee table and by using an inner join I'm only returning rows that have matching values (BusinessEntityID) in both tables:

SELECT e.BusinessEntityID, e.JobTitle
FROM HumanResources.Employee e
INNER JOIN Person.Person p 
ON e.BusinessEntityID = p.BusinessEntityID;

I might expect to see an operator that performs the inner join in the execution plan but when I look at the plan this is what I get:

Plan 1: Index Scan on HumanResources.Empoyee


Despite my query containing a join to the Person.Person table the optimiser has used an execution plan that only contains a clustered index scan of the HumanResources.Employee table. Even though the join is there in our query text it hasn't been used in the plan at all.

The reason why the join isn't present in our first query is partly down to the columns which the query is returning. I'm only selecting two columns which are both from the employee table, if I were to add a column from the Person.Person table such as in the following query then I will get a very different execution plan:

SELECT e.BusinessEntityID, e.JobTitle, p.FirstName
FROM HumanResources.Employee e
INNER JOIN Person.Person p 
ON e.BusinessEntityID = p.BusinessEntityID;


Plan 2: Query with column from Person.Person

By adding this column (or indeed any column from the Person.Person table or even worse if I had used SELECT *) the execution plan has two additional operators; an index seek on the Person.Person table and this time a join operator (Nested Loop) is present in the plan. This is also a great example of why only selecting columns that you actually need in a query is really important otherwise you might be causing needless overhead.

The column selection has clearly influenced the optimisers plan selection and when referencing columns in the Person.Person table in the select the optimiser has had to come up with a plan containing the join, but why for our first query has it not done the same thing? 

In this example the join removal is made possible due to a foreign key relationship between the two tables on the BusinessEntityID column. Which if you're playing along in Management Studio you can see under the HumanResources.Employee table under keys (FK_Employee_Person_BusinessEntityID).

As our inner join is essentially returning matching values from both tables and the foreign key exists on those the values the optimiser already knows that the values are matched and in turn as we're not returning any values from the Person.Person table it can avoid the cost of the join and the scan operation on the Person.Person table altogether, but we'll still get the same results.

In order to test this I'm going to remove the foreign key with the following T-SQL and then re-run the first query:

ALTER TABLE [HumanResources].[Employee] DROP CONSTRAINT [FK_Employee_Person_BusinessEntityID]
GO

SELECT e.BusinessEntityID, e.JobTitle
FROM HumanResources.Employee e
INNER JOIN Person.Person p 
ON e.BusinessEntityID = p.BusinessEntityID;

This time although we're only selecting records from the employee table we get a very different query plan that is unable to take advantage of join removal and as such has to perform an index scan on both tables which are joined using the Hash Match operator. 

Plan 3: Query with no foreign key



Before I carry on I'll add the foreign key back with the following T-SQL:

ALTER TABLE [HumanResources].[Employee]  WITH CHECK ADD  CONSTRAINT [FK_Employee_Person_BusinessEntityID] FOREIGN KEY([BusinessEntityID])
REFERENCES [Person].[Person] ([BusinessEntityID])
GO

ALTER TABLE [HumanResources].[Employee] CHECK CONSTRAINT [FK_Employee_Person_BusinessEntityID]
GO

This is a great example of how foreign keys not only force referential integrity but they do have some less obvious performance benefits too that the optimiser can take advantage during query optimisation phases. 

No comments:

Post a Comment

Breaking up with SQL Server

I was inspired to write this after reading a post from Dave Mason regarding breaking up with Big Tech companies. Yet again I haven't wr...