SQL Optimization Tutorial: IN and RANGE Queries

SQL Optimization Tutorial: IN and RANGE Queries

Preface

"High Performance MySQL" mentions that using the in method can effectively replace certain range queries and improve query efficiency, because in an index, the part after the range field is invalid (ps. ICP needs to be considered). The MySQL optimizer converts the in query into n*m combinations for querying, and finally merges the returned values, which is similar to union but more efficient.

MySQL will have many problems when there are too many IN() combination conditions. Query optimization can take a long time and consume a lot of memory. The new version of MySQL will not perform plan evaluation when the number of combinations exceeds a certain number, which may cause MySQL to not make good use of indexes.

The certain number here is controlled by the parameter eq_range_index_dive_limit in MySQL 5.6.5 and later versions. The default setting is 10, and it has been changed to 200 since version 5.7. Of course, it can be set manually. 5.6 The manual states as follows:

The eq_range_index_dive_limit system variable enables you to configure the number of values ​​at which the optimizer switches from one row estimation strategy to the other. To disable use of statistics and always use index dives, set eq_range_index_dive_limit to 0. To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. eq_range_index_dive_limit is available as of MySQL 5.6.5. Before 5.6.5, the optimizer uses index dives, which is equivalent to eq_range_index_dive_limit=0.

In other words,

eq_range_index_dive_limit = 0 Only index dive can be used

0 < eq_range_index_dive_limit <= N Use index statistics

eq_range_index_dive_limit > N: only index dive can be used

The default value was changed from 10 to 200 in MySQL 5.7 to ensure that the execution plan for range equality operations (IN()) is as accurate as possible, because the number of IN() lists often exceeds 10.

There is a sentence in the official MySQL manual:

The optimizer can estimate the row count for each range using dives into the index or index statistics.

main idea:

The optimizer estimates the number of tuples included in each range segment - for example, "a IN (10, 20, 30)" is considered an equality comparison, and the three range segments are actually simplified to three single values, 10, 20, and 30. The reason for using range segments is that most of the "range" scans in MySQL are range scans, and the single value here can be considered a special case of a range segment.

There are two estimation methods:

  1. Diving into the index means using the index to estimate the number of tuples, which is referred to as index dive;
  2. Index statistics: use the statistical values ​​of the index to make estimates;

Compare these two methods

  1. Index dive: slow, but can get accurate values ​​(MySQL's implementation is to count the number of index items corresponding to the index, so it is accurate)
  2. Index statistics: Fast, but the values ​​obtained may not be accurate

Simply put, the value of the option eq_range_index_dive_limit sets the upper limit of the number of conditions in the IN list. When the value is exceeded, the execution plan will be changed from index dive to index statistics.

Why do we need to distinguish between these two methods?

  1. The query optimizer uses the cost estimation model to calculate the cost of each plan and selects the one with the lowest cost.
  2. When scanning a single table, the cost needs to be calculated; so the index scan of a single table also needs to calculate the cost
  3. The calculation formula for a single table is usually: Cost = Number of tuples * Average IO value
  4. So no matter which scanning method is used, the number of tuples needs to be calculated
  5. When encountering an expression like "a IN (10, 20, 30)", if we find that column a has an index, we need to see how many tuples this index can scan and calculate the index scan cost. Therefore, we use the two methods mentioned in this article: "index dive" and "index statistics".

Discussion Topics

  1. Range query and index usage
  2. Description of eq_range_index_dive_limit

Range query and index usage

The SQL is as follows:

SELECT * FROM pre_forum_post WHERE tid=7932552 AND invisible IN('0','-2') ORDER BY dateline DESC LIMIT 10;

The index is as follows:

PRIMARY(tid,position),
pid(pid),
fid(tid),
displayorder(tid,invisible,dateline)
first(tid,first)
new_auth(authorid,invisible,tid)
idx_dt(dateline)
mul_test(tid,invisible,dateline,pid)

Look at the execution plan:

root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') 
 -> ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | 
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

The MySQL optimizer thinks that this is a range query, so the dateline field in the (tid, invisible, dateline) index will definitely not be used. In other words, the final sort of this SQL will definitely generate a temporary result set, and then complete the sorting in the result set, rather than directly completing the sorting action in the index. So we tried to add an index.

root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); 
Query OK, 20374596 rows affected, 0 warning (600.23 sec)
Records: 0 Duplicates: 0 Warnings: 0
root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+
| 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | 
+----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+
1 row in set (0.00 sec)
root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
...
10 rows in set (0.40 sec)
root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
...
10 rows in set (0.00 sec)

Experiments have shown that the effect is excellent. In fact, it is not difficult to understand. We have said above that in() retrieves data in a variety of combinations in the MySQL optimizer. If a sort or group is added, it can only be operated on a temporary result set. In other words, even if the index contains sort or group fields, it is still useless. The only complaint is that the MySQL optimizer's choice is still not reliable enough.

To summarize: When using in() in MySQL queries, in addition to paying attention to the number of in() lists and the value of eq_range_index_dive_limit (see below for details), you must also pay attention to the use of indexes if the SQL contains sorting/grouping/deduplication, etc.

Description of eq_range_index_dive_limit

Still using the above example, why can't idx_1 be used directly? Do I need to use a hint to force only this index to be used? Here we first look at the value of eq_range_index_dive_limit.

root@localhost 22:38:05 [ultrax]> show variables like 'eq_range_index_dive_limit';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 2 | 
+---------------------------+-------+
1 row in set (0.00 sec)

According to the situation we mentioned above, 0 < eq_range_index_dive_limit <= N uses index statistics, so let's use OPTIMIZER_TRACE to find out.

{
 "index": "displayorder",
 "ranges": [
 "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2",
 "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0"
 ],
 "index_dives_for_eq_ranges": false,
 "rowid_ordered": false,
 "using_mrr": false,
 "index_only": false,
 "rows": 54,
 "cost": 66.81,
 "chosen": true
}
// index dive is false, and finally chosen is true
...
{
 "index": "idx_1",
 "ranges": [
 "7932552 <= tid <= 7932552"
 ],
 "index_dives_for_eq_ranges": true,
 "rowid_ordered": false,
 "using_mrr": false,
 "index_only": false,
 "rows": 120646,
 "cost": 144776,
 "chosen": false,
 "cause": "cost"
}

We can see that the cost of the displayorder index is 66.81, while the cost of idx_1 is 120646, and finally the MySQL optimizer chooses the displayorder index. So if we set eq_range_index_dive_limit to > N, should we use the index dive calculation method to get a more accurate execution plan?

root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3;
Query OK, 0 rows affected (0.00 sec)
root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+
| 1 | SIMPLE | pre_forum_post | ref | PRIMARY,displayorder,first,mul_test,idx_1 | idx_1 | 3 | const | 120646 | Using where | 
+----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+
1 row in set (0.00 sec)

The optimize_trace results are as follows

{
 "index": "displayorder",
 "ranges": [
 "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2",
 "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0"
 ],
 "index_dives_for_eq_ranges": true,
 "rowid_ordered": false,
 "using_mrr": false,
 "index_only": false,
 "rows": 188193,
 "cost": 225834,
 "chosen": true
}
...
{
 "index": "idx_1",
 "ranges": [
 "7932552 <= tid <= 7932552"
 ],
 "index_dives_for_eq_ranges": true,
 "rowid_ordered": false,
 "using_mrr": false,
 "index_only": false,
 "rows": 120646,
 "cost": 144776,
 "chosen": true
}
...
 "cost_for_plan": 144775,
 "rows_for_plan": 120646,
 "chosen": true

In the alternative index selection, both indexes are selected, and in the final logical optimization, the index with the lowest cost, idx_1, is selected. The above is how the value of eq_range_index_dive_limit in equal range queries affects the MySQL optimizer calculation overhead, thereby affecting the index selection. In addition, we can use profiling to see the statistical time consumption of the optimizer:

Index dive

+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000048 | 
| checking permissions | 0.000004 | 
| Opening tables | 0.000015 | 
| init | 0.000044 | 
| System lock | 0.000009 | 
| optimizing | 0.000014 | 
| statistics | 0.032089 | 
| preparing | 0.000022 | 
| Sorting result | 0.000003 | 
| executing | 0.000003 | 
| Sending data | 0.000101 | 
| end | 0.000004 | 
| query end | 0.000002 | 
| closing tables | 0.000009 | 
| freeing items | 0.000013 | 
| cleaning up | 0.000012 | 
+----------------------+----------+

index statistics

+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000045 | 
| checking permissions | 0.000003 | 
| Opening tables | 0.000014 | 
| init | 0.000040 | 
| System lock | 0.000008 | 
| optimizing | 0.000014 | 
| statistics | 0.000086 | 
| preparing | 0.000016 | 
| Sorting result | 0.000002 | 
| executing | 0.000002 | 
| Sending data | 0.000016 | 
| Creating sort index | 0.412123 | 
| end | 0.000012 | 
| query end | 0.000004 | 
| closing tables | 0.000013 | 
| freeing items | 0.000023 | 
| cleaning up | 0.000015 | 
+----------------------+----------+

It can be seen that when eq_range_index_dive_limit is increased and index dive is used, the optimizer statistics take significantly longer than the ndex statistics method, but in the end it uses a more reasonable execution plan. The statistical time is 0.032089s vs .000086s, but the SQL execution time is about 0.03s vs 0.41s.

Attachment:

How to use optimize_trace

set optimizer_trace='enabled=on';

select * from information_schema.optimizer_trace\G

Note: optimizer_trace recommends enabling debugging only in session mode

How to use profile

set profiling=ON;
Execute sql;
show profiles;
show profile for query 2;
show profile block io,cpu for query 2;

You can also see information such as memory, swaps, context switches, source, etc.

References

[1]MySQL SQL Optimization Series: IN and RANGE Queries

https://www.jb51.net/article/201251.htm

[2]MySQL physical query optimization technology---index dive analysis

http://blog.163.com/li_hx/blog/static/18399141320147521735442/

This is the end of this article about the SQL optimization tutorial on in and range queries. For more relevant SQL optimization in and range query content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Analyze the techniques for optimizing distinct in MySQL
  • Examples of optimization techniques for slow query efficiency in MySQL IN statements
  • MySQL query optimization: Introduction to join query sort limit (join, order by, limit statement)
  • MySQL optimization: use join instead of subquery
  • SQL statement optimization: JOIN, LEFT JOIN and RIGHT JOIN statement optimization
  • SQL optimization for count, table join order, condition order, in and exist
  • Optimization of not in and minus in MySQL
  • Sharing on count optimization issues in innodb in mysql
  • Tutorial on the use and optimization of insert statements in MySQL

<<:  Docker cleanup environment operation

>>:  Detailed explanation of non-parent-child component communication in Vue3

Recommend

Implementing circular scrolling list function based on Vue

Note: You need to give the parent container a hei...

In-depth study of MySQL multi-version concurrency control MVCC

MVCC MVCC (Multi-Version Concurrency Control) is ...

FlashFXP ftp client software registration cracking method

The download address of FlashFXP is: https://www....

JavaScript to achieve calendar effect

This article shares the specific code for JavaScr...

Detailed explanation of how to install MySQL on Alibaba Cloud

As a lightweight open source database, MySQL is w...

A brief discussion on the solution to excessive data in ElementUI el-select

Table of contents 1. Scenario Description 2. Solu...

Node.js+express message board function implementation example

Table of contents Message Board Required librarie...

MySQL transaction details

Table of contents Introduction Four characteristi...

HTTP return code list (Chinese and English explanation)

http return code list (below is an overview) for ...

Detailed process of configuring NIS in Centos7

Table of contents principle Network environment p...

Website Building Tutorial for Beginners: Learn to Build a Website in Ten Days

The 10-day tutorial uses the most understandable ...

MySQL trigger detailed explanation and simple example

MySQL trigger simple example grammar CREATE TRIGG...