Hash Join Vs Nested Join: Forcing physical joins

Whenever we are using JOIN, SQL Server has three physical methods of doing this, which are Nested Join, Merge Join and Hash Join.

In simplest term Nested Join is used when SQL Server estimates less number of rows SQL Server gets to know that the number of rows returning from both the part of the Join will not be large. In this case for each row from one table it will do complete scan on the other for match.

Merge Join is generally used when the tables are pre-sorted on the column in which the Join condition is used. For every row in 1 table it compares it with another table and matching rows are moved into the result set. The data being already sorted out makes the comparison and search easy and fast.

Hash Join is used when the tables are big and they are also not sorted based on the Join column. The hash is calculated for both the tables one is stored in the memory and another matched with the hash table created from the hashes generated from key columns.

99.9% of time SQL Server chooses the appropriate physical join to find out the output it needs and the only time it is not able to choose the correct Join to use is always because of wrong estimates. The reason of wrong of estimates could be many.

Having said that, today we faced one such issue. The query which was running had 11 Joins, all Inner Joins. This query was taking around 3 minutes to run for a particular set of parameters.

All the estimates were haywire, the reason being the application team “had” to use multiple converts and that too at table end. Something like below:

 CONVERT(VARCHAR(10), Table.clm, 120) <= '2020-03-20' )

The above conversion was being used in WHERE clause at three different places and because of that SQL Server was not able to design the appropriate Query plan.

In the actual plan it was using Nested Loop and reiterating over a table.

The reason for SQL Server choosing it was because of the wrong estimates.

We were not allowed to remove the explicit conversions, but we figured out that this Nested Loop is the result of the explicit conversion and to avoid the nested join we used the Hash Join in the query itself.

INNER HASH JOIN Table.Clm1 On...

Once we implemented this, it resulted into not just the change of Join from Nested to Hash Join but also the whole query execution changed and the time for execution dropped down to <1 sec from the 2 minute time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s