Reasons and solutions for MySQL selecting the wrong index

Reasons and solutions for MySQL selecting the wrong index

In MySQL, you can specify multiple indexes for a table, but when the statement is executed, the index to be used is determined by the executor in MySQL. So what are the principles for the executor to select an index, and will the wrong index be selected?

Let’s look at an example:

Create table Y, set two common indexes , and create a stored procedure for inserting data.

MySQL: 5.7.27, Isolation Level: RR

CREATE TABLE `Y` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `a` int(11) DEFAULT NULL,
 `b` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `a` (`a`),
 KEY `b` (`b`)
)ENGINE=InnoDB;
delimiter;;
create procedure idata()
begin
 declare i int;
 set i=1;
 while(i<=100000)do
   insert into Y (`a`,`b`) values(i, i);
  set i=i+1;
 end while;
end;;
delimiter ;
call idata();

View the following transactions:

Session A Session B
start transaction with consistent snapshot;
delete from t;
call idata();
explain select * from Y where a between 10000 and 20000;
explain select * from Y force index(a) where a between 10000 and 20000;
commit;

If select * from Y where a between 10000 and 20000; is executed in Session B alone, index a will be selected without a doubt.

However, if you install Session A and Session B in sequence, you will find that the index selection is as follows:

It can be found that in the scenario of Session B, the executor did not select the index where a was located, but chose a full table scan based on the primary key index.

set long_query_time=0;
--Open the slow query log and set the threshold to 0. In the recorded log, you can find that MySQL did not select the index where a is located and took a longer time.

From this perspective, the MySQL optimizer may not always be able to choose the right index. To understand the reason for this phenomenon, we must start with the optimizer's selection logic.

Optimizer

The purpose of the optimizer in MySQL is to find an optimal execution plan so as to execute the statement at the lowest cost.

The optimizer mainly considers the following factors when selecting an index:

  • Number of rows scanned: The fewer rows scanned, the fewer times the disk data is accessed, and the less CPU resources are consumed.
  • Is there any temporary table involved?
  • Sorting

Determining the number of scan lines

Calculate the cardinality of an index

Before executing a statement, MySQL cannot accurately calculate the number of rows scanned, but estimates the number of records through mathematical statistics. This statistic is called the "discrimination" of the index, and the more distinct values ​​there are in the index, the higher the discrimination. The number of distinct values ​​in an index is called the cardinality. The larger the cardinality, the better the discrimination of the index.

The Cardinality here is the cardinality of the index, but the cardinality is not completely accurate. MySQL actually uses sampling statistics to obtain cardinality.

During the calculation, N data pages are selected, and the different values ​​on these pages are counted to get an average value, which is then multiplied by the number of pages of the index, and the result is the cardinality of the index.

In MySQL, there are two ways to store indexes, which can be switched by setting innodb_stats_persistent:

  • When on: it means that the statistics will be stored persistently. The default N is 20 and M is 10.
  • When off, statistics are stored only in memory. The default N is 8 and M is 16.

Since the data in the table is constantly changing, when the updated value exceeds 1/M, index statistics will be automatically triggered.

However, it should be noted that since it is a sampling statistic, the value of the cardinality is not accurate .

Error in estimating the number of scan lines

As we saw earlier, the estimated number of rows for Select * from Y where a between 10000 and 20000 is 100015, which is understandable because a full table scan is performed.

Then execute select * from Y force index(a) where a between 10000 and 20000 the estimated number of rows is 37116, which is incomprehensible. Ideally, it should be 10001 rows (need to traverse to 20001).

What is even more strange is that although the estimated number of rows of 37116 is not reasonable, it is much smaller than the 100015 of a full table scan. Why does the optimizer still choose a full table scan?

Let's look at the second question first. The reason for choosing 100015 is that if index a is used, in addition to scanning the index a, it is also necessary to go back to the table. The optimizer also needs to take the query cost on the primary key index into account, so a full table scan is chosen.

Now let's look at the first question again, why the correct number of rows is not obtained. This is related to the consistency view. First, in Session A, the consistency view is enabled, but it is not submitted. After the subsequent session clears the Y table, it recreates the same data. At this time, each row of data has two versions. The old version is the data before deletion, and the new version is the data marked as deleted. So there are actually two copies of the data at index a. This results in an error in the estimated number of rows.

MySQL deletes records by marking them for deletion, rather than actually deleting them from indexes and data files. And because of the guarantee of consistent reading, the delete space cannot be deleted and the insert space added. This results in incorrect statistical information.

Solution to choosing the wrong index

If the number of rows is estimated incorrectly, you can use the following method:

If the number of rows calculated by EXPLAIN differs greatly from the estimated number of rows, you can use analyze table to recalculate the index information.

Use force index to specify the index to be used directly, without letting the optimizer make any judgment. But using force may also bring some problems:

  • When migrating a database, the syntax is not supported
  • It is not easy to change and not very convenient, because choosing the wrong index does not happen often. The code needs to be changed only after problems occur in the production environment, but it also needs to be re-tested and deployed online.

Optimize SQL statements and guide the optimizer to use the correct index

Let’s look at a similar example:

Let’s look at this sentence first.

SQL select * from Y where a between 1 and 1000 and b between5000 100000 order by b limit 1;

When executing this sentence, you can choose index a or index b. We know that each index corresponds to a B+ tree. Here, since what is obtained is the intersection of a and b, if index a is selected, it is necessary to traverse rows 1 - 10001. Selecting index b requires traversing rows 50000 - 100001. Theoretically, a should be selected as the index, but the optimizer chooses b as the index.

The reason why b is chosen as the index here is because the optimizer sees the subsequent order by statement. Since sorting is required and the B+ tree itself is ordered, the sorting process is omitted, so b is chosen as the index.

But from the actual execution time, index a has a shorter execution time, so MySQL has chosen the wrong index again.

We can change order by b limit in the above statement order by b,a limit 1 In this case, since both the a and b indexes need to be sorted, the number of rows scanned becomes the main reference condition for the executor, guiding the selection of the correct index.

The premise of doing this is to ensure that the logical results of the execution are consistent. For example, when limit 1 is used, the results of order by b,a and order by b are consistent. However, this may not be the case if limit 100 is used.

There is another change

select * from (select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 100)alias limit 1;

Now you can see that the optimizer has chosen an appropriate index. The reason is that limit 100 makes the optimizer think that the cost of using index b is higher, and thus chooses index a. In fact, limit 100 is used to induce the optimizer to make a choice.

Adjusting the index

Can you find a better and more suitable index, or use the index principles to delete some unnecessary indexes.

Summarize

Now we know that MySQL may make mistakes when selecting indexes. There are three main principles for the optimizer to select indexes: the number of rows to scan, whether there is a temporary table, and sorting. The number of rows scanned is mainly related to the cardinality, and the cardinality statistics are determined by statistical sampling, so the estimated number of rows may be inaccurate.

When the number of scanned rows is incorrect, you can use analyze table to recalculate the table information, force the specified index through force index , or manually change the semantics of sql to induce the optimizer to make the correct choice.

The above are the reasons why MySQL selected the wrong index and the detailed solutions. For more information about MySQL index, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • In-depth explanation of MySQL common index and unique index
  • Efficiency comparison between common index and unique index in MySQL
  • MySql index improves query speed common methods code examples
  • Summary of several situations in which MySQL indexes fail
  • Which one should I choose between MySQL unique index and normal index?

<<:  WeChat applet realizes left-right linkage

>>:  Detailed explanation of root directory settings in nginx.conf

Recommend

8 Reasons Why You Should Use Xfce Desktop Environment for Linux

For several reasons (including curiosity), I star...

MySQL joint table query basic operation left-join common pitfalls

Overview For small and medium-sized projects, joi...

How to install openssh from source code in centos 7

Environment: CentOS 7.1.1503 Minimum Installation...

ReactJs Basics Tutorial - Essential Edition

Table of contents 1. Introduction to ReactJS 2. U...

How to manage users and groups when running Docker

Docker is a management tool that uses processes a...

Summary of basic knowledge points of Linux group

1. Basic Introduction of Linux Group In Linux, ev...

Detailed tutorial on installing Docker and nvidia-docker on Ubuntu 16.04

Table of contents Docker Installation Nvidia-dock...

Vue+element+oss realizes front-end fragment upload and breakpoint resume

Pure front-end implementation:切片上傳斷點續傳.斷點續傳needs ...

Turn off the AutoComplete function in the input box

Now we can use an attribute of input called autoco...

VMwarea virtual machine installation win7 operating system tutorial diagram

The installation process of VMwarea will not be d...

Linux firewall status check method example

How to check the status of Linux firewall 1. Basi...

isPrototypeOf Function in JavaScript

Table of contents 1. isPrototypeOf() Example 1, O...

How to prevent computer slowdown when WIN10 has multiple databases installed

Enable the service when you need it, and disable ...

Implementation of socket options in Linux network programming

Socket option function Function: Methods used to ...