MySQL 9.3 Reference Manual Including MySQL NDB Cluster 9.3
Certain optimizations are applicable to comparisons that use
the IN
(or =ANY
)
operator to test subquery results. This section discusses
these optimizations, particularly with regard to the
challenges that NULL
values present. The
last part of the discussion suggests how you can help the
optimizer.
Consider the following subquery comparison:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
MySQL evaluates queries “from outside to inside.”
That is, it first obtains the value of the outer expression
outer_expr
, and then runs the
subquery and captures the rows that it produces.
A very useful optimization is to “inform” the
subquery that the only rows of interest are those where the
inner expression inner_expr
is
equal to outer_expr
. This is done
by pushing down an appropriate equality into the
subquery's WHERE
clause to make it
more restrictive. The converted comparison looks like this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.
More generally, a comparison of N
values to a subquery that returns
N
-value rows is subject to the same
conversion. If oe_i
and
ie_i
represent corresponding outer
and inner expression values, this subquery comparison:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
Becomes:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDoe_1
=ie_1
AND ... ANDoe_N
=ie_N
)
For simplicity, the following discussion assumes a single pair of outer and inner expression values.
The “pushdown” strategy just described works if either of these conditions is true:
outer_expr
and
inner_expr
cannot be
NULL
.
You need not distinguish NULL
from
FALSE
subquery results. If the subquery
is a part of an OR
or
AND
expression in the
WHERE
clause, MySQL assumes that you do
not care. Another instance where the optimizer notices
that NULL
and FALSE
subquery results need not be distinguished is this
construct:
... WHEREouter_expr
IN (subquery
)
In this case, the WHERE
clause rejects
the row whether IN
(
returns
subquery
)NULL
or FALSE
.
Suppose that outer_expr
is known to
be a non-NULL
value but the subquery does
not produce a row such that
outer_expr
=
inner_expr
. Then
evaluates as follows:
outer_expr
IN (SELECT
...)
In this situation, the approach of looking for rows with
is no longer
valid. It is necessary to look for such rows, but if none are
found, also look for rows where
outer_expr
=
inner_expr
inner_expr
is
NULL
. Roughly speaking, the subquery can be
converted to something like this:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND (outer_expr
=inner_expr
ORinner_expr
IS NULL))
The need to evaluate the extra IS
NULL
condition is why MySQL has the
ref_or_null
access method:
mysql>EXPLAIN
SELECT
outer_expr
IN (SELECT t2.maybe_null_keyFROM t2, t3 WHERE ...)
FROM t1;
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: ref_or_null possible_keys: maybe_null_key key: maybe_null_key key_len: 5 ref: func rows: 2 Extra: Using where; Using index ...
The unique_subquery
and
index_subquery
subquery-specific access methods also have “or
NULL
” variants.
The additional OR ... IS NULL
condition
makes query execution slightly more complicated (and some
optimizations within the subquery become inapplicable), but
generally this is tolerable.
The situation is much worse when
outer_expr
can be
NULL
. According to the SQL interpretation
of NULL
as “unknown value,”
NULL IN (SELECT
should evaluate to:
inner_expr
...)
For proper evaluation, it is necessary to be able to check
whether the SELECT
has produced
any rows at all, so
cannot be
pushed down into the subquery. This is a problem because many
real world subqueries become very slow unless the equality can
be pushed down.
outer_expr
=
inner_expr
Essentially, there must be different ways to execute the
subquery depending on the value of
outer_expr
.
The optimizer chooses SQL compliance over speed, so it
accounts for the possibility that
outer_expr
might be
NULL
:
If outer_expr
is
NULL
, to evaluate the following
expression, it is necessary to execute the
SELECT
to determine whether
it produces any rows:
NULL IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
It is necessary to execute the original
SELECT
here, without any
pushed-down equalities of the kind mentioned previously.
On the other hand, when
outer_expr
is not
NULL
, it is absolutely essential that
this comparison:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
Be converted to this expression that uses a pushed-down condition:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
ANDouter_expr
=inner_expr
)
Without this conversion, subqueries are slow.
To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within “trigger” functions. Thus, an expression of the following form:
outer_expr
IN (SELECTinner_expr
FROM ... WHEREsubquery_where
)
Is converted into:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(outer_expr
=inner_expr
))
More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:
(oe_1
, ...,oe_N
) IN (SELECTie_1
, ...,ie_N
FROM ... WHEREsubquery_where
)
And converts it to this expression:
EXISTS (SELECT 1 FROM ... WHEREsubquery_where
AND trigcond(oe_1
=ie_1
) AND ... AND trigcond(oe_N
=ie_N
) )
Each trigcond(
is a special function that evaluates to the following values:
X
)
X
when the
“linked” outer expression
oe_i
is not
NULL
TRUE
when the “linked”
outer expression oe_i
is
NULL
Trigger functions are not triggers of
the kind that you create with CREATE
TRIGGER
.
Equalities that are wrapped within
trigcond()
functions are not first class
predicates for the query optimizer. Most optimizations cannot
deal with predicates that may be turned on and off at query
execution time, so they assume any
trigcond(
to
be an unknown function and ignore it. Triggered equalities can
be used by those optimizations:
X
)
Reference optimizations:
trigcond(
can
be used to construct
X
=Y
[OR Y
IS NULL])ref
,
eq_ref
, or
ref_or_null
table
accesses.
Index lookup-based subquery execution engines:
trigcond(
can be used to construct
X
=Y
)unique_subquery
or
index_subquery
accesses.
Table-condition generator: If the subquery is a join of several tables, the triggered condition is checked as soon as possible.
When the optimizer uses a triggered condition to create some
kind of index lookup-based access (as for the first two items
of the preceding list), it must have a fallback strategy for
the case when the condition is turned off. This fallback
strategy is always the same: Do a full table scan. In
EXPLAIN
output, the fallback
shows up as Full scan on NULL key
in the
Extra
column:
mysql>EXPLAIN SELECT t1.col1,
t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: t1 ... *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: t2 type: index_subquery possible_keys: key1 key: key1 key_len: 5 ref: func rows: 2 Extra: Using where; Full scan on NULL key
If you run EXPLAIN
followed by
SHOW WARNINGS
, you can see the
triggered condition:
*************************** 1. row *************************** Level: Note Code: 1003 Message: select `test`.`t1`.`col1` AS `col1`, <in_optimizer>(`test`.`t1`.`col1`, <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2 on key1 checking NULL where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)` from `test`.`t1`
The use of triggered conditions has some performance
implications. A NULL IN (SELECT ...)
expression now may cause a full table scan (which is slow)
when it previously did not. This is the price paid for correct
results (the goal of the trigger-condition strategy is to
improve compliance, not speed).
For multiple-table subqueries, execution of NULL IN
(SELECT ...)
is particularly slow because the join
optimizer does not optimize for the case where the outer
expression is NULL
. It assumes that
subquery evaluations with NULL
on the left
side are very rare, even if there are statistics that indicate
otherwise. On the other hand, if the outer expression might be
NULL
but never actually is, there is no
performance penalty.
To help the query optimizer better execute your queries, use these suggestions:
Declare a column as NOT NULL
if it
really is. This also helps other aspects of the optimizer
by simplifying condition testing for the column.
If you need not distinguish a NULL
from
FALSE
subquery result, you can easily
avoid the slow execution path. Replace a comparison that
looks like this:
outer_expr
[NOT] IN (SELECTinner_expr
FROM ...)
with this expression:
(outer_expr
IS NOT NULL) AND (outer_expr
[NOT] IN (SELECTinner_expr
FROM ...))
Then NULL IN (SELECT ...)
is never
evaluated because MySQL stops evaluating
AND
parts as soon as the
expression result is clear.
Another possible rewrite:
[NOT] EXISTS (SELECTinner_expr
FROM ... WHEREinner_expr
=outer_expr
)
The
subquery_materialization_cost_based
flag of the optimizer_switch
system variable enables control over the choice between
subquery materialization and
IN
-to-EXISTS
subquery
transformation. See
Section 10.9.2, “Switchable Optimizations”.