MySQL 9.3 Reference Manual Including MySQL NDB Cluster 9.3
The range
access method
uses a single index to retrieve a subset of table rows that
are contained within one or several index value intervals. It
can be used for a single-part or multiple-part index. The
following sections describe conditions under which the
optimizer uses range access.
For a single-part index, index value intervals can be
conveniently represented by corresponding conditions in the
WHERE
clause, denoted as
range conditions
rather than “intervals.”
The definition of a range condition for a single-part index is as follows:
For both BTREE
and
HASH
indexes, comparison of a key
part with a constant value is a range condition when
using the
=
,
<=>
,
IN()
, IS
NULL
, or IS NOT
NULL
operators.
Additionally, for BTREE
indexes,
comparison of a key part with a constant value is a
range condition when using the
>
,
<
,
>=
,
<=
,
BETWEEN
,
!=
,
or
<>
operators, or LIKE
comparisons if the argument to
LIKE
is a constant string
that does not start with a wildcard character.
For all index types, multiple range conditions combined
with OR
or
AND
form a range condition.
“Constant value” in the preceding descriptions means one of the following:
Here are some examples of queries with range conditions in
the WHERE
clause:
SELECT * FROM t1 WHEREkey_col
> 1 ANDkey_col
< 10; SELECT * FROM t1 WHEREkey_col
= 1 ORkey_col
IN (15,18,20); SELECT * FROM t1 WHEREkey_col
LIKE 'ab%' ORkey_col
BETWEEN 'bar' AND 'foo';
Some nonconstant values may be converted to constants during the optimizer constant propagation phase.
MySQL tries to extract range conditions from the
WHERE
clause for each of the possible
indexes. During the extraction process, conditions that
cannot be used for constructing the range condition are
dropped, conditions that produce overlapping ranges are
combined, and conditions that produce empty ranges are
removed.
Consider the following statement, where
key1
is an indexed column and
nonkey
is not indexed:
SELECT * FROM t1 WHERE (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z');
The extraction process for key key1
is as
follows:
Start with original WHERE
clause:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR (key1 < 'bar' AND nonkey = 4) OR (key1 < 'uux' AND key1 > 'z')
Remove nonkey = 4
and key1
LIKE '%b'
because they cannot be used for a
range scan. The correct way to remove them is to replace
them with TRUE
, so that we do not
miss any matching rows when doing the range scan.
Replacing them with TRUE
yields:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR (key1 < 'bar' AND TRUE) OR (key1 < 'uux' AND key1 > 'z')
Collapse conditions that are always true or false:
(key1 LIKE 'abcde%' OR TRUE)
is
always true
(key1 < 'uux' AND key1 >
'z')
is always false
Replacing these conditions with constants yields:
(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)
Removing unnecessary TRUE
and
FALSE
constants yields:
(key1 < 'abc') OR (key1 < 'bar')
Combining overlapping intervals into one yields the final condition to be used for the range scan:
(key1 < 'bar')
In general (and as demonstrated by the preceding example),
the condition used for a range scan is less restrictive than
the WHERE
clause. MySQL performs an
additional check to filter out rows that satisfy the range
condition but not the full WHERE
clause.
The range condition extraction algorithm can handle nested
AND
/OR
constructs of arbitrary depth, and its output does not
depend on the order in which conditions appear in
WHERE
clause.
MySQL does not support merging multiple ranges for the
range
access method for
spatial indexes. To work around this limitation, you can use
a UNION
with identical
SELECT
statements, except
that you put each spatial predicate in a different
SELECT
.
Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.
For example, consider a multiple-part index defined as
key1(
, and the
following set of key tuples listed in key order:
key_part1
,
key_part2
,
key_part3
)
key_part1
key_part2
key_part3
NULL 1 'abc' NULL 1 'xyz' NULL 2 'foo' 1 1 'abc' 1 1 'xyz' 1 2 'abc' 2 1 'aaa'
The condition
defines this interval:
key_part1
= 1
(1,-inf,-inf) <= (key_part1
,key_part2
,key_part3
) < (1,+inf,+inf)
The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.
By contrast, the condition
does not define a single interval and cannot
be used by the range access method.
key_part3
=
'abc'
The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.
For HASH
indexes, each interval
containing identical values can be used. This means that
the interval can be produced only for conditions in the
following form:
key_part1
cmp
const1
ANDkey_part2
cmp
const2
AND ... ANDkey_partN
cmp
constN
;
Here, const1
,
const2
, … are
constants, cmp
is one of the
=
,
<=>
,
or IS NULL
comparison
operators, and the conditions cover all index parts.
(That is, there are N
conditions, one for each part of an
N
-part index.) For example,
the following is a range condition for a three-part
HASH
index:
key_part1
= 1 ANDkey_part2
IS NULL ANDkey_part3
= 'foo'
For the definition of what is considered to be a constant, see Range Access Method for Single-Part Indexes.
For a BTREE
index, an interval might
be usable for conditions combined with
AND
, where each condition
compares a key part with a constant value using
=
,
<=>
,
IS NULL
,
>
,
<
,
>=
,
<=
,
!=
,
<>
,
BETWEEN
, or
LIKE
'
(where
pattern
''
does not start with a wildcard). An interval can be used
as long as it is possible to determine a single key
tuple containing all rows that match the condition (or
two intervals if
pattern
'<>
or !=
is used).
The optimizer attempts to use additional key parts to
determine the interval as long as the comparison
operator is
=
,
<=>
,
or IS NULL
. If the operator
is
>
,
<
,
>=
,
<=
,
!=
,
<>
,
BETWEEN
, or
LIKE
, the
optimizer uses it but considers no more key parts. For
the following expression, the optimizer uses
=
from
the first comparison. It also uses
>=
from the second comparison but considers no further key
parts and does not use the third comparison for interval
construction:
key_part1
= 'foo' ANDkey_part2
>= 10 ANDkey_part3
> 10
The single interval is:
('foo',10,-inf) < (key_part1
,key_part2
,key_part3
) < ('foo',+inf,+inf)
It is possible that the created interval contains more
rows than the initial condition. For example, the
preceding interval includes the value ('foo',
11, 0)
, which does not satisfy the original
condition.
If conditions that cover sets of rows contained within
intervals are combined with
OR
, they form a condition
that covers a set of rows contained within the union of
their intervals. If the conditions are combined with
AND
, they form a condition
that covers a set of rows contained within the
intersection of their intervals. For example, for this
condition on a two-part index:
(key_part1
= 1 ANDkey_part2
< 2) OR (key_part1
> 5)
The intervals are:
(1,-inf) < (key_part1
,key_part2
) < (1,2) (5,-inf) < (key_part1
,key_part2
)
In this example, the interval on the first line uses one
key part for the left bound and two key parts for the
right bound. The interval on the second line uses only
one key part. The key_len
column in
the EXPLAIN
output
indicates the maximum length of the key prefix used.
In some cases, key_len
may indicate
that a key part was used, but that might be not what you
would expect. Suppose that
key_part1
and
key_part2
can be
NULL
. Then the
key_len
column displays two key part
lengths for the following condition:
key_part1
>= 1 ANDkey_part2
< 2
But, in fact, the condition is converted to this:
key_part1
>= 1 ANDkey_part2
IS NOT NULL
For a description of how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index, see Range Access Method for Single-Part Indexes. Analogous steps are performed for range conditions on multiple-part indexes.
Consider these expressions, where
col_name
is an indexed column:
col_name
IN(val1
, ...,valN
)col_name
=val1
OR ... ORcol_name
=valN
Each expression is true if
col_name
is equal to any of
several values. These comparisons are equality range
comparisons (where the “range” is a single
value). The optimizer estimates the cost of reading
qualifying rows for equality range comparisons as follows:
If there is a unique index on
col_name
, the row estimate
for each range is 1 because at most one row can have the
given value.
Otherwise, any index on
col_name
is nonunique and the
optimizer can estimate the row count for each range
using dives into the index or index statistics.
With index dives, the optimizer makes a dive at each end of
a range and uses the number of rows in the range as the
estimate. For example, the expression
has three equality ranges and the optimizer
makes two dives per range to generate a row estimate. Each
pair of dives yields an estimate of the number of rows that
have the given value.
col_name
IN (10, 20,
30)
Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.
The
eq_range_index_dive_limit
system variable enables you to configure the number of
values at which the optimizer switches from one row
estimation strategy to the other. To permit use of index
dives for comparisons of up to N
equality ranges, set
eq_range_index_dive_limit
to N
+ 1. To disable use of
statistics and always use index dives regardless of
N
, set
eq_range_index_dive_limit
to 0.
To update table index statistics for best estimates, use
ANALYZE TABLE
.
Prior to MySQL 9.3, there is no way of skipping
the use of index dives to estimate index usefulness, except
by using the
eq_range_index_dive_limit
system variable. In MySQL 9.3, index dive
skipping is possible for queries that satisfy all these
conditions:
The query is for a single table, not a join on multiple tables.
A single-index FORCE INDEX
index hint
is present. The idea is that if index use is forced,
there is nothing to be gained from the additional
overhead of performing dives into the index.
The index is nonunique and not a
FULLTEXT
index.
No subquery is present.
No DISTINCT
, GROUP
BY
, or ORDER BY
clause is
present.
For EXPLAIN FOR
CONNECTION
, the output changes as follows if index
dives are skipped:
For traditional output, the rows
and
filtered
values are
NULL
.
For JSON output,
rows_examined_per_scan
and
rows_produced_per_join
do not appear,
skip_index_dive_due_to_force
is
true
, and cost calculations are not
accurate.
Without FOR CONNECTION
,
EXPLAIN
output does not
change when index dives are skipped.
After execution of a query for which index dives are
skipped, the corresponding row in the Information Schema
OPTIMIZER_TRACE
table contains
an index_dives_for_range_access
value of
skipped_due_to_force_index
.
Consider the following scenario:
CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2)); INSERT INTO t1 VALUES (1,1), (1,2), (1,3), (1,4), (1,5), (2,1), (2,2), (2,3), (2,4), (2,5); INSERT INTO t1 SELECT f1, f2 + 5 FROM t1; INSERT INTO t1 SELECT f1, f2 + 10 FROM t1; INSERT INTO t1 SELECT f1, f2 + 20 FROM t1; INSERT INTO t1 SELECT f1, f2 + 40 FROM t1; ANALYZE TABLE t1; EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40;
To execute this query, MySQL can choose an index scan to
fetch all rows (the index includes all columns to be
selected), then apply the f2 > 40
condition from the WHERE
clause to
produce the final result set.
A range scan is more efficient than a full index scan, but
cannot be used in this case because there is no condition on
f1
, the first index column. The optimizer
can perform multiple range scans, one for each value of
f1
, using a method called Skip Scan that
is similar to Loose Index Scan (see
Section 10.2.1.17, “GROUP BY Optimization”):
Skip between distinct values of the first index part,
f1
(the index prefix).
Perform a subrange scan on each distinct prefix value
for the f2 > 40
condition on the
remaining index part.
For the data set shown earlier, the algorithm operates like this:
Get the first distinct value of the first key part
(f1 = 1
).
Construct the range based on the first and second key
parts (f1 = 1 AND f2 > 40
).
Perform a range scan.
Get the next distinct value of the first key part
(f1 = 2
).
Construct the range based on the first and second key
parts (f1 = 2 AND f2 > 40
).
Perform a range scan.
Using this strategy decreases the number of accessed rows because MySQL skips the rows that do not qualify for each constructed range. This Skip Scan access method is applicable under the following conditions:
Table T has at least one compound index with key parts
of the form ([A_1, ..., A_k
,]
B_1, ..., B_m
, C [, D_1, ...,
D_n
]). Key parts A and D may
be empty, but B and C must be nonempty.
The query references only one table.
The query does not use GROUP BY
or
DISTINCT
.
The query references only columns in the index.
The predicates on A_1, ...,
A_k
must be equality
predicates and they must be constants. This includes the
IN()
operator.
The query must be a conjunctive query; that is, an
AND
of OR
conditions:
(
cond1
(key_part1
)
OR
cond2
(key_part1
))
AND
(cond1
(key_part2
)
OR ...) AND ...
There must be a range condition on C.
Conditions on D columns are permitted. Conditions on D must be in conjunction with the range condition on C.
Use of Skip Scan is indicated in EXPLAIN
output as follows:
Using index for skip scan
in the
Extra
column indicates that the loose
index Skip Scan access method is used.
If the index can be used for Skip Scan, the index should
be visible in the possible_keys
column.
Use of Skip Scan is indicated in optimizer trace output by a
"skip scan"
element of this form:
"skip_scan_range": { "type": "skip_scan", "index":index_used_for_skip_scan
, "key_parts_used_for_access": [key_parts_used_for_access
], "range": [range
] }
You may also see a
"best_skip_scan_summary"
element. If Skip
Scan is chosen as the best range access variant, a
"chosen_range_access_summary"
is written.
If Skip Scan is chosen as the overall best access method, a
"best_access_path"
element is present.
Use of Skip Scan is subject to the value of the
skip_scan
flag of the
optimizer_switch
system
variable. See Section 10.9.2, “Switchable Optimizations”. By
default, this flag is on
. To disable it,
set skip_scan
to
off
.
In addition to using the
optimizer_switch
system
variable to control optimizer use of Skip Scan session-wide,
MySQL supports optimizer hints to influence the optimizer on
a per-statement basis. See
Section 10.9.3, “Optimizer Hints”.
The optimizer is able to apply the range scan access method to queries of this form:
SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));
Previously, for range scans to be used, it was necessary to write the query as:
SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' ) OR ( col_1 = 'c' AND col_2 = 'd' );
For the optimizer to use a range scan, queries must satisfy these conditions:
On the left side of the
IN()
predicate, the row
constructor contains only column references.
On the right side of the
IN()
predicate, row
constructors contain only runtime constants, which are
either literals or local column references that are
bound to constants during execution.
On the right side of the
IN()
predicate, there is
more than one row constructor.
For more information about the optimizer and row constructors, see Section 10.2.1.22, “Row Constructor Expression Optimization”
To control the memory available to the range optimizer, use
the
range_optimizer_max_mem_size
system variable:
A value of 0 means “no limit.”
With a value greater than 0, the optimizer tracks the
memory consumed when considering the range access
method. If the specified limit is about to be exceeded,
the range access method is abandoned and other methods,
including a full table scan, are considered instead.
This could be less optimal. If this happens, the
following warning occurs (where
N
is the current
range_optimizer_max_mem_size
value):
Warning 3170 Memory capacity of N
bytes for
'range_optimizer_max_mem_size' exceeded. Range
optimization was not done for this query.
For UPDATE
and
DELETE
statements, if the
optimizer falls back to a full table scan and the
sql_safe_updates
system
variable is enabled, an error occurs rather than a
warning because, in effect, no key is used to determine
which rows to modify. For more information, see
Using Safe-Updates Mode (--safe-updates).
For individual queries that exceed the available range
optimization memory and for which the optimizer falls back
to less optimal plans, increasing the
range_optimizer_max_mem_size
value may improve performance.
To estimate the amount of memory needed to process a range expression, use these guidelines:
For a simple query such as the following, where there is
one candidate key for the range access method, each
predicate combined with OR
uses approximately 230 bytes:
SELECT COUNT(*) FROM t
WHERE a=1 OR a=2 OR a=3 OR .. . a=N
;
Similarly for a query such as the following, each
predicate combined with AND
uses approximately 125 bytes:
SELECT COUNT(*) FROM t
WHERE a=1 AND b=1 AND c=1 ... N
;
For a query with IN()
predicates:
SELECT COUNT(*) FROM t WHERE a IN (1,2, ...,M
) AND b IN (1,2, ...,N
);
Each literal value in an
IN()
list counts as a
predicate combined with OR
.
If there are two IN()
lists, the number of predicates combined with
OR
is the product of the
number of literal values in each list. Thus, the number
of predicates combined with
OR
in the preceding case is
M
×
N
.