MySQL optimization tutorial: large paging query

background
LIMIT Optimization
Optimization method

1. Use covering index
2. Subquery optimization
3. Delayed association
4. Record the location where the last query ended

Summarize

background

Basically, as long as you do backend development, you will come into contact with the demand or function of paging. Basically everyone uses MySQL's LIMIT to handle this, and the project I am currently responsible for is also written in this way. However, once the amount of data increases, the efficiency of LIMIT will be extremely low. This article will talk about the optimization of the LIMIT clause.

LIMIT Optimization

Many business scenarios require the paging function, which is basically implemented using LIMIT.

Create a table and insert 2 million records:

# Create a new t5 table CREATE TABLE `t5` (
 `id` int NOT NULL AUTO_INCREMENT,
 `name` varchar(50) NOT NULL,
 `text` varchar(100) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `ix_name` (`name`),
 KEY `ix_test` (`text`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

# Create a stored procedure to insert 2 million data CREATE PROCEDURE t5_insert_200w()
BEGIN
 DECLARE i INT;
 SET i=1000000;
 WHILE i<=3000000 DO
 INSERT INTO t5(`name`,text) VALUES('god-jiang666',concat('text', i));
 SET i=i+1;
 END WHILE;
END;

# Call the stored procedure to insert 2 million data call t5_insert_200w();

When page turns are relatively infrequent, LIMIT will not cause any performance issues.

But what if the user needs to find the last page?

Normally, we need to ensure that all pages can jump normally, because we will not use reverse SQL such as order by xxx desc to query the following pages, but use the forward order to do paging query:

select * from t5 order by text limit 100000, 10;

If this kind of SQL query paging is used, the cost of extracting these 10 rows of data from 2 million data is very high. It is necessary to sort and find the first 1,000,010 records first, and then discard the first 1,000,000 records. My MacBook Pro took 5.578 seconds to run.

Next, let's take a look at the execution plan of the above SQL statement:

explain select * from t5 order by text limit 1000000, 10;

From the execution plan, we can see that in the case of large paging, MySQL does not perform index scanning, even though I have added an index to the text field.

Why is this?

Going back to MySQL Index (II) How to Design an Index , it is mentioned that the query optimizer of the MySQL database adopts a cost-based approach, and the query cost estimation is based on CPU cost and IO cost .

If MySQL believes that a full table scan is more efficient than an index scan in its query cost estimation, it will abandon the index and perform a full table scan directly.

This is why, in SQL queries with large pages, even though the field is indexed, MySQL still performs a full table scan.

Then we continue to use the above query SQL to verify my guess:

explain select * from t5 order by text limit 7774, 10;

explain select * from t5 order by text limit 7775, 10;

The above experiments were all run on my MBP. At the critical point of 7774, MySQL used the query optimization methods of index scan and full table scan respectively.

So it can be assumed that MySQL will determine whether to use the index based on its own cost query optimizer.

Since we cannot manually intervene in the core algorithm of MySQL's query optimizer, our optimization strategy should focus on how to maintain the paging at the optimal paging critical point.

Optimization method

1. Use covering index

If a SQL statement can directly obtain the query result through the index without returning to the table for query, this index is called a covering index.

Use the explain keyword in the MySQL database to view the execution plan. If the extra column displays Using index, it means that this SQL statement uses a covering index.

Let's compare how much performance can be improved by using a covering index.

# No covering index is used select * from t5 order by text limit 1000000, 10;

This query took 3.690 seconds. Let’s see how much performance can be improved by using the covering index optimization.

# Using covering index select id, `text` from t5 order by text limit 1000000, 10;

From the comparison above, in the super large paging query, after using the covering index, it took 0.201 seconds, while it took 3.690 seconds without using the covering index, which is more than 18 times faster. In actual development, this is a major performance optimization. (This data is obtained by running it on my MBP)

2. Subquery optimization

In actual development, SELECT operations to query one or two columns are very rare, so the application scope of the above covering index is relatively limited.

Therefore, we can improve performance by rewriting the paging SQL statement into a subquery.

select * from t5 where id>=(select id from t5 order by text limit 1000000, 1) limit 10;

In fact, the efficiency improvement achieved by using this method is basically the same as that achieved by using the covering index above.

However, this optimization method also has limitations:

This writing method requires that the primary key ID must be continuous
The Where clause does not allow you to add other conditions

3. Delayed association

Similar to the above subquery approach, we can use JOIN to complete the paging operation on the index column first, and then return to the table to obtain the required columns.

select a.* from t5 a inner join (select id from t5 order by text limit 1000000, 10) b on a.id=b.id;

From the experiment, we can conclude that after rewriting using JOIN, the above two limitations have been removed, and the SQL execution efficiency has not been lost.

4. Record the location where the last query ended

Different from the methods used above, the optimization idea of recording the last end position is to use a variable to record the position of the last data, and start scanning directly from the position of this variable during the next paging, thereby avoiding MySQL scanning a large amount of data and then discarding it.

select * from t5 where id>=1000000 limit 10;

Based on the above experiments, it is not difficult to conclude that SQL has the fastest performance due to the use of primary key indexes for paging operations.

Summarize

Introduced the reasons for the poor performance of large paging queries, and shared several optimization ideas
The optimization idea of super large paging is to make the paging SQL execute in the best performance range as much as possible without triggering a full table scan.
I hope the above sharing can help you avoid detours on the road of MySQL~~~

References

Chapter 6 of "MySQL Performance Optimization" - Query Optimization Performance
The Art of Database Query Optimizer

This is the end of this article about the MySQL optimization tutorial on large paging queries. For more relevant MySQL large paging query content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in: