Preface "High Performance MySQL" mentions that using the in method can effectively replace certain range queries and improve query efficiency, because in an index, the part after the range field is invalid (ps. ICP needs to be considered). The MySQL optimizer converts the in query into n*m combinations for querying, and finally merges the returned values, which is similar to union but more efficient. MySQL will have many problems when there are too many IN() combination conditions. Query optimization can take a long time and consume a lot of memory. The new version of MySQL will not perform plan evaluation when the number of combinations exceeds a certain number, which may cause MySQL to not make good use of indexes. The certain number here is controlled by the parameter eq_range_index_dive_limit in MySQL 5.6.5 and later versions. The default setting is 10, and it has been changed to 200 since version 5.7. Of course, it can be set manually. 5.6 The manual states as follows:
In other words, eq_range_index_dive_limit = 0 Only index dive can be used 0 < eq_range_index_dive_limit <= N Use index statistics eq_range_index_dive_limit > N: only index dive can be used The default value was changed from 10 to 200 in MySQL 5.7 to ensure that the execution plan for range equality operations (IN()) is as accurate as possible, because the number of IN() lists often exceeds 10. There is a sentence in the official MySQL manual:
main idea:
There are two estimation methods:
Compare these two methods
Simply put, the value of the option eq_range_index_dive_limit sets the upper limit of the number of conditions in the IN list. When the value is exceeded, the execution plan will be changed from index dive to index statistics. Why do we need to distinguish between these two methods?
Discussion Topics
Range query and index usage The SQL is as follows: SELECT * FROM pre_forum_post WHERE tid=7932552 AND invisible IN('0','-2') ORDER BY dateline DESC LIMIT 10; The index is as follows: PRIMARY(tid,position), pid(pid), fid(tid), displayorder(tid,invisible,dateline) first(tid,first) new_auth(authorid,invisible,tid) idx_dt(dateline) mul_test(tid,invisible,dateline,pid) Look at the execution plan: root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') -> ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ 1 row in set (0.00 sec) The MySQL optimizer thinks that this is a range query, so the dateline field in the (tid, invisible, dateline) index will definitely not be used. In other words, the final sort of this SQL will definitely generate a temporary result set, and then complete the sorting in the result set, rather than directly completing the sorting action in the index. So we tried to add an index. root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); Query OK, 20374596 rows affected, 0 warning (600.23 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ 1 row in set (0.00 sec) root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.40 sec) root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.00 sec) Experiments have shown that the effect is excellent. In fact, it is not difficult to understand. We have said above that in() retrieves data in a variety of combinations in the MySQL optimizer. If a sort or group is added, it can only be operated on a temporary result set. In other words, even if the index contains sort or group fields, it is still useless. The only complaint is that the MySQL optimizer's choice is still not reliable enough.
Description of eq_range_index_dive_limit Still using the above example, why can't idx_1 be used directly? Do I need to use a hint to force only this index to be used? Here we first look at the value of eq_range_index_dive_limit. root@localhost 22:38:05 [ultrax]> show variables like 'eq_range_index_dive_limit'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | eq_range_index_dive_limit | 2 | +---------------------------+-------+ 1 row in set (0.00 sec) According to the situation we mentioned above, 0 < eq_range_index_dive_limit <= N uses index statistics, so let's use OPTIMIZER_TRACE to find out. { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": false, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 54, "cost": 66.81, "chosen": true } // index dive is false, and finally chosen is true ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": false, "cause": "cost" } We can see that the cost of the displayorder index is 66.81, while the cost of idx_1 is 120646, and finally the MySQL optimizer chooses the displayorder index. So if we set eq_range_index_dive_limit to > N, should we use the index dive calculation method to get a more accurate execution plan? root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3; Query OK, 0 rows affected (0.00 sec) root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | PRIMARY,displayorder,first,mul_test,idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ 1 row in set (0.00 sec) The optimize_trace results are as follows { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 188193, "cost": 225834, "chosen": true } ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": true } ... "cost_for_plan": 144775, "rows_for_plan": 120646, "chosen": true In the alternative index selection, both indexes are selected, and in the final logical optimization, the index with the lowest cost, idx_1, is selected. The above is how the value of eq_range_index_dive_limit in equal range queries affects the MySQL optimizer calculation overhead, thereby affecting the index selection. In addition, we can use profiling to see the statistical time consumption of the optimizer: Index dive +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000048 | | checking permissions | 0.000004 | | Opening tables | 0.000015 | | init | 0.000044 | | System lock | 0.000009 | | optimizing | 0.000014 | | statistics | 0.032089 | | preparing | 0.000022 | | Sorting result | 0.000003 | | executing | 0.000003 | | Sending data | 0.000101 | | end | 0.000004 | | query end | 0.000002 | | closing tables | 0.000009 | | freeing items | 0.000013 | | cleaning up | 0.000012 | +----------------------+----------+ index statistics +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000045 | | checking permissions | 0.000003 | | Opening tables | 0.000014 | | init | 0.000040 | | System lock | 0.000008 | | optimizing | 0.000014 | | statistics | 0.000086 | | preparing | 0.000016 | | Sorting result | 0.000002 | | executing | 0.000002 | | Sending data | 0.000016 | | Creating sort index | 0.412123 | | end | 0.000012 | | query end | 0.000004 | | closing tables | 0.000013 | | freeing items | 0.000023 | | cleaning up | 0.000015 | +----------------------+----------+ It can be seen that when eq_range_index_dive_limit is increased and index dive is used, the optimizer statistics take significantly longer than the ndex statistics method, but in the end it uses a more reasonable execution plan. The statistical time is 0.032089s vs .000086s, but the SQL execution time is about 0.03s vs 0.41s. Attachment: How to use optimize_trace set optimizer_trace='enabled=on'; select * from information_schema.optimizer_trace\G
How to use profile set profiling=ON; Execute sql; show profiles; show profile for query 2; show profile block io,cpu for query 2; You can also see information such as memory, swaps, context switches, source, etc. References [1]MySQL SQL Optimization Series: IN and RANGE Queries https://www.jb51.net/article/201251.htm [2]MySQL physical query optimization technology---index dive analysis http://blog.163.com/li_hx/blog/static/18399141320147521735442/ This is the end of this article about the SQL optimization tutorial on in and range queries. For more relevant SQL optimization in and range query content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Docker cleanup environment operation
>>: Detailed explanation of non-parent-child component communication in Vue3
Note: You need to give the parent container a hei...
MVCC MVCC (Multi-Version Concurrency Control) is ...
Find the problem I recently encountered a problem...
The download address of FlashFXP is: https://www....
This article shares the specific code for JavaScr...
As a lightweight open source database, MySQL is w...
Table of contents 1. Scenario Description 2. Solu...
Many organizations have the need to back up file ...
Table of contents Message Board Required librarie...
Table of contents Introduction Four characteristi...
http return code list (below is an overview) for ...
Table of contents principle Network environment p...
The 10-day tutorial uses the most understandable ...
MySQL trigger simple example grammar CREATE TRIGG...
When the jsp that is jumped to after the struts2 a...