sqlLogical clause execution order Window functions - inner behaviors and optimization
NULLs cause the usual query gotchas. People end up trying to memorize null interactions. But you don't need to memorize anything. Simply understand NULL, and it will immediately make sense in all contexts. Known or unknown.
WARNING - There are some edge cases to the logic I've given below (Eg - SQL Server allows comparison of dates with null). But the logic generally holds well.
What the engine thinks about null
To the engine, Null means unknown.
Imagine I took a value and put it in a black box. Now, you don't know what the value in the box is. So now:
- You cannot compare any value to the unknown value in the black box
- For example, if I ask you - is the value in the box equal to 100? - you cannot tell. It is an unknown to you.
- You can group together unknowns.
- Suppose I give you 10 values: 5 are normal values, 5 are values in black boxes. Now you don't know what the value in the boxes are, but you can still group together all the unknown values in the boxes. You just take all these black boxes and throw them into a group.
This is all you need to understand. You are now thinking like the database engine.
To the engine, nulls are unknown values. Values in a black box. So naturally, a few things will happen:
- Any operation with an unknown will give an unknown result.
Nulland so on.
- Any comparison with an unknown will fail.
Null == Nullis
false. Why? because how can you compare two unknowns!
1 == Nullis
false. Why? because you cannot compare with an unknown.
Not sure about a null interaction and want to test it in your database engine? Just run a query like this:
SELECT 1+NULL; -- if that didn't work, select from dual SELECT 1+NULL FROM dual;
Now, just look at unknowns (Nulls) in various SQL related contexts, and all those tricky null interactions in SQL will make sense:
Any operation with unknown is going to give unknown
SELECT 1+NULL will give the output:
null values are ignored.
Because you cannot aggregate unknowns. The answer will be unknown, so it will be useless.
Eg: you cannot calculate the SUM of unknowns. So the database engine will simply ignore all nulls when it is calculating the SUM over a column.
Join operation is basically comparing the values in 2 columns, and combining the rows if the values are equal.
So now, what if one column of the join condition has an unknown, or both columns have unknowns?
You cannot decide equality with an unknown. You just cannot compare. So, the database engine will not join the rows when either column has an unknown value (Null).
In joins, rows with Null values in the join condition are ignored
This is also a good interview question. If you feel that the candidate has simply memorized the answer, follow up with a simple question: "Why?"
Suppose the query is: SELECT * FROM person where name=NULL
Here, you are asking name to be compared with an unknown. So the equality is going to fail.
Why? because you cannot compare something with an unknown. So you cannot determine equality with an unknown.
NULL is not equal to NULL
If you need to do this, your db engine will provide other ways to check this. Examples:
SELECT * FROM person WHERE name IS NULL SELECT * FROM person WHERE name IS NOT NULL
You get ifNull(), isNull(), coalesce(), nvl() etc to deal with these situations. Do null comparisons through these functions.
SORT BY clause
NULLs are unknowns. So the engine can differentiate between unknowns and actual values. So the engines will group together all the nulls and put them at the top or bottom of the sorted results.
GROUP BY clause
NULLs are unknowns.
You told the engine: group together all values that are the same.
So now, the engine will take all unknowns, treat them as the same category, and group them together. Everyone who is an unknown will get grouped into the same category.