Practical record of solving MySQL deep paging problem

Practical record of solving MySQL deep paging problem

Preface

When we do paging needs in daily life, we usually use limit to implement it, but when the offset is particularly large, the query efficiency becomes low. This article will discuss how to optimize the deep paging problem of millions of MySQL data in 4 solutions, and attach a recent practical case study on optimizing slow SQL in production.

Why does limit deep paging become slower?

Let’s look at the following table structure first:

CREATE TABLE account (
  id int(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key Id',
  name varchar(255) DEFAULT NULL COMMENT 'Account name',
  balance int(11) DEFAULT NULL COMMENT 'Balance',
  create_time datetime NOT NULL COMMENT 'Creation time',
  update_time datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'Update time',
  PRIMARY KEY (id),
  KEY idx_name (name),
  KEY idx_update_time (update_time) //index) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT='Account table';

Assume that the SQL executed for deep paging is as follows:

select id,name,balance from account where update_time> '2020-09-19' limit 100000,10;

The execution time of this SQL is as follows:

It takes 0.742 seconds to complete. Why does deep paging slow down? If you change to limit 0,10, it only takes 0.006 seconds

Let's first look at the execution process of this SQL:

  1. Through the common secondary index tree idx_update_time, filter the update_time condition and find the record ID that meets the condition.
  2. Go back to the primary key index tree through the ID, find the row that satisfies the record, and then retrieve the displayed columns (return to the table)
  3. Scan the 100010 rows that meet the condition, then discard the first 100000 rows and return.

SQL execution process

The execution plan is as follows:

There are two reasons why SQL becomes slow:

  1. The limit statement will first scan the offset+n rows, then discard the rows before the offset and return the next n rows of data. That is to say, if limit 100000,10, 100010 rows will be scanned, while if limit 0,10, only 10 rows will be scanned.
  2. limit 100000,10 Scanning more rows also means more table returns.

Optimization through subqueries

Because the above SQL returns the table 100010 times, in fact, we only need 10 pieces of data, that is, we only need to return the table 10 times. Therefore, we can optimize by reducing the number of table returns.

Review of B+ Tree Structure

So, how to reduce the number of table returns? Let's review the B+ tree index structure first~

In InnoDB, indexes are divided into primary key indexes (clustered indexes) and secondary indexes

  • Primary key index, the leaf node stores the entire row of data
  • Secondary index, the leaf node stores the value of the primary key.

Transfer the condition to the primary key index tree

If we transfer the query conditions back to the primary key index tree, we can reduce the number of table returns. If we switch to primary key index tree query, the query condition must be changed to primary key id. What about the previous SQL update_time conditions? Where is the subquery?

How do you extract the subquery? Because the leaf nodes of the secondary index have primary key IDs, we can directly query the primary key ID based on update_time. At the same time, we also transfer the condition of limit 100000 to the subquery. The complete SQL is as follows:

select id,name,balance FROM account where id >= (select a.id from account a where a.update_time >= '2020-09-19' limit 100000, 1) LIMIT 10;

The query effect is the same, and the execution time only takes 0.038 seconds!

Let’s look at the execution plan.

From the execution plan, we know that the subquery table a query uses the idx_update_time index. First, we get the primary key ID of the clustered index from the index, eliminating the need to return to the table, and then the second query can directly query 10 more based on the ID of the first query!

Therefore, this plan is feasible~

INNER JOIN Delayed Join

The optimization idea of ​​delayed join is actually the same as that of subquery: both transfer the conditions to the primary key index tree and reduce the number of table returns. The difference is that the delayed join uses inner join instead of subquery.

The optimized SQL is as follows:

SELECT acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.update_time >= '2020-09-19' ORDER BY a.update_time LIMIT 100000, 10) AS acct2 on acct1.id= acct2.id;

The query effect is also leveraged, only taking 0.034 seconds

The execution plan is as follows:

The query idea is to first query the primary key ID that meets the conditions through the idx_update_time secondary index tree, and then connect with the original table through the primary key ID. In this way, the primary key index is directly used later, and the number of table returns is also reduced.

Label Recording

The fundamental reason for the limit deep paging problem is that the larger the offset, the more rows MySQL will scan and then discard. This results in a decrease in query performance.

In fact, we can use the label recording method, that is, mark which item was queried last time, and the next time we check it, we will start scanning from that item. It’s just like reading a book. You can fold it or put a bookmark where you left off last time, so you can go straight to that page next time you read it.

Assuming that the last record was 100000, the SQL can be modified to:

select id,name,balance FROM account where id > 100000 order by id limit 10;

In this way, no matter how many pages are turned later, the performance will be good because the id index is hit. But this method has limitations: it requires a field similar to a continuous self-incrementing field.

Use between...and...

In many cases, the limit query can be converted into a query with a known position, so that MySQL can obtain the corresponding results through a range scan between...and.

If you know the boundary values ​​are 100000 and 100010, you can optimize it like this:

select id,name,balance FROM account where id between 100000 and 100010 order by id desc;

Hands-on practical cases

Let’s take a look at a real-life case. Assume that the table structure is as follows and there are 2 million data.

CREATE TABLE account (
 id varchar(32) COLLATE utf8_bin NOT NULL COMMENT 'Primary key',
 account_no varchar(64) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'Account number'
 amount decimal(20,2) DEFAULT NULL COMMENT 'Amount'
 type varchar(10) COLLATE utf8_bin DEFAULT NULL COMMENT 'Type A, B'
 create_time datetime DEFAULT NULL COMMENT 'Creation time',
 update_time datetime DEFAULT NULL COMMENT 'Update time',
 PRIMARY KEY (id),
 KEY `idx_account_no` (account_no),
 KEY `idx_create_time` (create_time)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Account table'

The business requirement is as follows: Get the latest Type A account data for 2021 and report it to the big data platform.

How to implement the general idea

Many partners will directly implement this requirement when they receive it:

//Query the total number of reports Integer total = accountDAO.countAccount();

//Query the SQL corresponding to the total number of reports
<select id ='countAccount' resultType="java.lang.Integer">
  seelct count(1)
   from account
  where create_time >='2021-01-01 00:00:00'
  and type = 'A'
</select>

//Calculate the number of pages int pageNo = total % pageSize == 0 ? total / pageSize : (total / pageSize + 1);

//Paging query, report for(int i = 0; i < pageNo; i++){
 List<AcctountPO> list = accountDAO.listAccountByPage(startRow,pageSize);
 startRow = (pageNo-1)*pageSize;
 // Report big data postBigData(list);
}

 // Paging query SQL (there may be limit deep paging issues because the account table has millions of data)
<select id ='listAccountByPage' >
  seelct *
   from account
  where create_time >='2021-01-01 00:00:00'
  and type = 'A'
  limit #{startRow},#{pageSize}
</select>

Practical optimization plan

The above implementation will have the problem of limit deep paging because the account table has millions of data. So how to optimize it?

In fact, you can use the tag recording method. Some friends may be confused. The ID primary key is not continuous. Can you really use tag recording?

Of course, if the id is not continuous, we can make it continuous through order by. The optimization scheme is as follows:

//Query the minimum ID
String lastId = accountDAO.queryMinId();

//Query the SQL corresponding to the maximum ID
<select id="queryMinId" returnType="java.lang.String">
select MIN(id)
 from account
where create_time >='2021-01-01 00:00:00'
and type = 'A'
</select>

//Number of entries per page Integer pageSize = 100;

List<AcctountPO> list;
do{
   list = listAccountByPage(lastId,pageSize);
   //Tag recording method, record the ID queried last time
   lastId = list.get(list,size()-1).getId();
    // Report big data postBigData(list);
}while(CollectionUtils.isNotEmpty(list));

<select id="listAccountByPage">
  select *
   from account
   where create_time >='2021-01-01 00:00:00'
  and id > #{lastId}
  and type = 'A'
  order by id asc
    limit #{pageSize}
</select>

Summarize

This is the end of this article about MySQL deep paging issues. For more information about MySQL deep paging issues, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Design and implementation of a student club management system based on JavaSwing+MySQL
  • Introduction to MySQL Connection Control Plugin
  • The impact of limit on query performance in MySQL
  • Hotel Management System Designed and Implemented Based on JavaSwing
  • Design and implementation of JavaSwing tank battle game
  • Detailed explanation of JavaSwing basics Layout layout related knowledge
  • JavaSwing background music mp3
  • Design and implementation of supermarket commodity management system based on Mysql+JavaSwing

<<:  A brief discussion on whether CSS animation will be blocked by JS

>>:  Basic usage of @Font-face and how to make it compatible with all browsers

Recommend

MySQL tutorial data definition language DDL example detailed explanation

Table of contents 1. Introduction to the basic fu...

mysql method to recursively search for all child nodes of a menu node

background There is a requirement in the project ...

Example code of how to create a collapsed header effect using only CSS

Collapsed headers are a great solution for displa...

Detailed explanation of the wonderful CSS attribute MASK

This article will introduce a very interesting at...

MySQL transaction, isolation level and lock usage example analysis

This article uses examples to describe MySQL tran...

Vue must learn knowledge points: the use of forEach()

Preface In front-end development, we often encoun...

Vue implements websocket customer service chat function

This article mainly introduces how to implement a...

Solution to secure-file-priv problem when exporting MySQL data

ERROR 1290 (HY000) : The MySQL server is running ...

Discussion on default margin and padding values ​​of common elements

Today we discussed the issue of what the margin v...

How to implement mysql database backup in golang

background Navicat is the best MySQL visualizatio...

Alibaba Cloud Centos7.3 installation mysql5.7.18 rpm installation tutorial

Uninstall MariaDB CentOS7 installs MariaDB instea...

Vue el-date-picker dynamic limit time range case detailed explanation

There are two situations 1. Start time and end ti...