In MySQL, you can specify multiple indexes for a table, but when the statement is executed, the index to be used is determined by the executor in MySQL. So what are the principles for the executor to select an index, and will the wrong index be selected? Let’s look at an example: Create table Y, set two common indexes , and create a stored procedure for inserting data.
CREATE TABLE `Y` ( `id` int(11) NOT NULL AUTO_INCREMENT, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`) )ENGINE=InnoDB; delimiter;; create procedure idata() begin declare i int; set i=1; while(i<=100000)do insert into Y (`a`,`b`) values(i, i); set i=i+1; end while; end;; delimiter ; call idata(); View the following transactions:
If select * from Y where a between 10000 and 20000; is executed in Session B alone, index a will be selected without a doubt. However, if you install Session A and Session B in sequence, you will find that the index selection is as follows: It can be found that in the scenario of Session B, the executor did not select the index where a was located, but chose a full table scan based on the primary key index. set long_query_time=0; --Open the slow query log and set the threshold to 0. In the recorded log, you can find that MySQL did not select the index where a is located and took a longer time. From this perspective, the MySQL optimizer may not always be able to choose the right index. To understand the reason for this phenomenon, we must start with the optimizer's selection logic. Optimizer The purpose of the optimizer in MySQL is to find an optimal execution plan so as to execute the statement at the lowest cost. The optimizer mainly considers the following factors when selecting an index:
Determining the number of scan lines Calculate the cardinality of an index Before executing a statement, MySQL cannot accurately calculate the number of rows scanned, but estimates the number of records through mathematical statistics. This statistic is called the "discrimination" of the index, and the more distinct values there are in the index, the higher the discrimination. The number of distinct values in an index is called the cardinality. The larger the cardinality, the better the discrimination of the index. The Cardinality here is the cardinality of the index, but the cardinality is not completely accurate. MySQL actually uses sampling statistics to obtain cardinality.
In MySQL, there are two ways to store indexes, which can be switched by setting innodb_stats_persistent:
Since the data in the table is constantly changing, when the updated value exceeds 1/M, index statistics will be automatically triggered. However, it should be noted that since it is a sampling statistic, the value of the cardinality is not accurate . Error in estimating the number of scan lines As we saw earlier, the estimated number of rows for Then execute What is even more strange is that although the estimated number of rows of 37116 is not reasonable, it is much smaller than the 100015 of a full table scan. Why does the optimizer still choose a full table scan? Let's look at the second question first. The reason for choosing 100015 is that if index a is used, in addition to scanning the index a, it is also necessary to go back to the table. The optimizer also needs to take the query cost on the primary key index into account, so a full table scan is chosen. Now let's look at the first question again, why the correct number of rows is not obtained. This is related to the consistency view. First, in Session A, the consistency view is enabled, but it is not submitted. After the subsequent session clears the Y table, it recreates the same data. At this time, each row of data has two versions. The old version is the data before deletion, and the new version is the data marked as deleted. So there are actually two copies of the data at index a. This results in an error in the estimated number of rows.
Solution to choosing the wrong index If the number of rows is estimated incorrectly, you can use the following method: If the number of rows calculated by EXPLAIN differs greatly from the estimated number of rows, you can use analyze table to recalculate the index information. Use force index to specify the index to be used directly, without letting the optimizer make any judgment. But using force may also bring some problems:
Optimize SQL statements and guide the optimizer to use the correct index Let’s look at a similar example: Let’s look at this sentence first. SQL select * from Y where a between 1 and 1000 and b between5000 100000 order by b limit 1; When executing this sentence, you can choose index a or index b. We know that each index corresponds to a B+ tree. Here, since what is obtained is the intersection of a and b, if index a is selected, it is necessary to traverse rows 1 - 10001. Selecting index b requires traversing rows 50000 - 100001. Theoretically, a should be selected as the index, but the optimizer chooses b as the index. The reason why b is chosen as the index here is because the optimizer sees the subsequent But from the actual execution time, index a has a shorter execution time, so MySQL has chosen the wrong index again. We can change The premise of doing this is to ensure that the logical results of the execution are consistent. For example, when limit 1 is used, the results of There is another change select * from (select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 100)alias limit 1; Now you can see that the optimizer has chosen an appropriate index. The reason is that limit 100 makes the optimizer think that the cost of using index b is higher, and thus chooses index a. In fact, limit 100 is used to induce the optimizer to make a choice. Adjusting the index Can you find a better and more suitable index, or use the index principles to delete some unnecessary indexes. Summarize Now we know that MySQL may make mistakes when selecting indexes. There are three main principles for the optimizer to select indexes: the number of rows to scan, whether there is a temporary table, and sorting. The number of rows scanned is mainly related to the cardinality, and the cardinality statistics are determined by statistical sampling, so the estimated number of rows may be inaccurate. When the number of scanned rows is incorrect, you can use The above are the reasons why MySQL selected the wrong index and the detailed solutions. For more information about MySQL index, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: WeChat applet realizes left-right linkage
>>: Detailed explanation of root directory settings in nginx.conf
For several reasons (including curiosity), I star...
Overview For small and medium-sized projects, joi...
Environment: CentOS 7.1.1503 Minimum Installation...
Table of contents 1. Introduction to ReactJS 2. U...
Docker is a management tool that uses processes a...
1. Basic Introduction of Linux Group In Linux, ev...
Table of contents Docker Installation Nvidia-dock...
Pure front-end implementation:切片上傳斷點續傳.斷點續傳needs ...
Now we can use an attribute of input called autoco...
The installation process of VMwarea will not be d...
How to check the status of Linux firewall 1. Basi...
In the previous article, I introduced the functio...
Table of contents 1. isPrototypeOf() Example 1, O...
Enable the service when you need it, and disable ...
Socket option function Function: Methods used to ...