First, let's talk about the in() query. It is mentioned in "High Performance MySQL" that using the in method can effectively replace certain range queries and improve query efficiency, because in an index, the part after the range field is invalid. When using the in method, the MySQL optimizer actually converts the query into n*m combinations and finally merges the return values, which is a bit similar to union but more efficient. At the same time, it has some problems:
The "certain number" here is controlled by the parameter eq_range_index_dive_limit in MySQL 5.6.5 and later versions (thanks to @叶金荣 for his advice). The default setting is 10, and it will be changed to 200 in versions after 5.7. Of course, we can set it manually. Let's look at the instructions in the 5.6 manual:
That is:
Index dive and index statistics are the methods used by the MySQL optimizer to estimate the cost. The former has a slow statistical speed but can obtain accurate values, while the latter has a fast statistical speed but the data may not be accurate. The optimizer can estimate the row count for each range using dives into the index or index statistics. The default value was changed from 10 to 200 in MySQL 5.7 to ensure that the execution plan for range equality operations (IN()) is as accurate as possible, because the number of IN() lists often exceeds 10. Say it in advance There are two topics for today’s article:
Range query and index usage The SQL is as follows: SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; The index is as follows: +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+------------+ | pre_forum_post | 0 | PRIMARY | 1 | tid | A | NULL | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | PRIMARY | 2 | position | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 0 | pid | 1 | pid | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | fid | 1 | fid | A | 1490 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | displayorder | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | first | 2 | first | A | 1215304 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 1 | authorid | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 2 | invisible | A | 1963184 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | new_auth | 3 | tid | A | 12760696 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | idx_dt | 1 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 1 | tid | A | 880048 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 2 | invisible | A | 945236 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 3 | dateline | A | 25521392 | NULL | NULL | | BTREE | | | | pre_forum_post | 1 | mul_test | 4 | pid | A | 25521392 | NULL | NULL | | BTREE | | | +----------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+------------+ Look at the execution plan: root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') -> ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ | 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | +----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+ 1 row in set (0.00 sec) The MySQL optimizer thinks that this is a range query, so the dateline field in the (tid, invisible, dateline) index will definitely not be used. In other words, the final sort of this SQL will definitely generate a temporary result set, and then complete the sorting in the result set, rather than directly completing the sorting action in the index. So we tried to add an index. root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); Query OK, 20374596 rows affected, 0 warning (600.23 sec) Records: 0 Duplicates: 0 Warnings: 0 root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+---------------+-------+---------+---+--------+-------------+ 1 row in set (0.00 sec) root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.40 sec) root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; ... 10 rows in set (0.00 sec) Experiments have shown that the effect is excellent. In fact, it is not difficult to understand. We have said above that in() retrieves data in a variety of combinations in the MySQL optimizer. If a sort or group is added, it can only be operated on a temporary result set. In other words, even if the index contains sort or group fields, it is still useless. The only complaint is that the MySQL optimizer's choice is still not reliable enough. To summarize: When using in() in MySQL queries, in addition to paying attention to the number of in() lists and the value of eq_range_index_dive_limit (see below for details), you must also pay attention to the use of indexes if the SQL contains sorting/grouping/deduplication, etc. Description of eq_range_index_dive_limit Still using the above example, why can't idx_1 be used directly? Do I need to use a hint to force only this index to be used? Here we first look at the value of eq_range_index_dive_limit. root@localhost 22:38:05 [ultrax]> show variables like 'eq_range_index_dive_limit'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | eq_range_index_dive_limit | 2 | +---------------------------+-------+ 1 row in set (0.00 sec) According to the situation we mentioned above, 0 < eq_range_index_dive_limit <= N uses index statistics, so let's use OPTIMIZER_TRACE to find out. { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": false, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 54, "cost": 66.81, "chosen": true } // index dive is false, and finally chosen is true ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": false, "cause": "cost" } We can see that the cost of the displayorder index is 66.81, while the cost of idx_1 is 120646, and finally the MySQL optimizer chooses the displayorder index. So if we set eq_range_index_dive_limit to > N, should we use the index dive calculation method to get a more accurate execution plan? root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3; Query OK, 0 rows affected (0.00 sec) root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10; +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ | 1 | SIMPLE | pre_forum_post | ref | PRIMARY,displayorder,first,mul_test,idx_1 | idx_1 | 3 | const | 120646 | Using where | +----+-------------+----------------+------+-------------------------------------------+-------+---------+---+--------+-------------+ 1 row in set (0.00 sec) The optimize_trace results are as follows { "index": "displayorder", "ranges": [ "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2", "7932552 <= tid <= 7932552 AND 0 <= invisible <= 0" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 188193, "cost": 225834, "chosen": true } ... { "index": "idx_1", "ranges": [ "7932552 <= tid <= 7932552" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 120646, "cost": 144776, "chosen": true } ... "cost_for_plan": 144775, "rows_for_plan": 120646, "chosen": true // In the alternative index selection, both indexes are selected. In the final logic optimization, the index with the lowest cost, i.e. idx_1, is selected. The above is how the value of eq_range_index_dive_limit affects the MySQL optimizer calculation overhead in equal range queries, thereby affecting the index selection. In addition, we can use profiling to see the statistical time consumption of the optimizer: Index dive +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000048 | | checking permissions | 0.000004 | | Opening tables | 0.000015 | | init | 0.000044 | | System lock | 0.000009 | | optimizing | 0.000014 | | statistics | 0.032089 | | preparing | 0.000022 | | Sorting result | 0.000003 | | executing | 0.000003 | | Sending data | 0.000101 | | end | 0.000004 | | query end | 0.000002 | | closing tables | 0.000009 | | freeing items | 0.000013 | | cleaning up | 0.000012 | +----------------------+----------+ index statistics +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000045 | | checking permissions | 0.000003 | | Opening tables | 0.000014 | | init | 0.000040 | | System lock | 0.000008 | | optimizing | 0.000014 | | statistics | 0.000086 | | preparing | 0.000016 | | Sorting result | 0.000002 | | executing | 0.000002 | | Sending data | 0.000016 | | Creating sort index | 0.412123 | | end | 0.000012 | | query end | 0.000004 | | closing tables | 0.000013 | | freeing items | 0.000023 | | cleaning up | 0.000015 | +----------------------+----------+ It can be seen that when eq_range_index_dive_limit is increased and index dive is used, the optimizer statistics take significantly longer than the ndex statistics method, but in the end it uses a more reasonable execution plan. The statistical time is 0.032089s vs .000086s, but the SQL execution time is about 0.03s vs 0.41s. Appendix: How to use optimize_trace set optimizer_trace='enabled=on'; select * from information_schema.optimizer_trace\G // Note: optimizer_trace recommends enabling debugging only in session mode References http://dev.mysql.com/doc/refman/5.6/en/range-optimization.html http://imysql.com/2014/08/05/a-fake-bug-with-eq-range-index-dive-limit.shtml http://blog.163.com/li_hx/blog/static/18399141320147521735442/ This is the end of this article about the MySQL SQL optimization tutorial on in and range queries. For more information about MySQL SQL optimization on in and range queries, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Vue: Detailed explanation of memory leaks
>>: HTML implements the function of detecting input completion
Load one or more features <template> <di...
Table of contents Conditional compilation Page La...
Table of contents 1. Encapsulate complex page dat...
Welcome to the previous canvas game series: 《VUE ...
MySQL full-text index is a special index that gen...
As shown in the figure: But when viewed under IE6...
Regarding the connection method between Java and ...
Table of contents Preface Instruction Basics Hook...
What are the attributes of the JS script tag: cha...
Cross-domain solutions jsonp (simulate get) CORS ...
<br />Forms are an important channel for use...
This article example shares the specific code of ...
If you want the path following the domain name to...
Today I recommend such an open source tool for ex...
Swiper is a sliding special effects plug-in built...