MySQL deep paging (how to quickly paginate tens of millions of data)

MySQL deep paging (how to quickly paginate tens of millions of data)

Preface

In backend development, in order to prevent too much data from being loaded at one time, resulting in excessive memory and disk IO overhead, paging display is often required. At this time, the LIMIT keyword of MySQL is needed. But do you think that everything will be fine with LIMIT paging? It’s too young and too simple. When the amount of data is large, one problem that LIMIT is likely to cause is deep paging.

Case

Here I take the display of e-commerce order details as an example, and the new table is as follows:

CREATE TABLE `cps_user_order_detail` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  `user_id` varchar(32) NOT NULL DEFAULT '' COMMENT 'User ID',
  `order_id` bigint(20) DEFAULT NULL COMMENT 'Order id',
  `sku_id` bigint(20) unsigned NOT NULL COMMENT 'Product ID',
  `order_time` datetime DEFAULT NULL COMMENT 'Order time, format yyyy-MM-dd HH:mm:ss',
   PRIMARY KEY (`id`),
   KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User order details';

Then manually insert 1.2 million records into the table.
Now there is a requirement: display the user's order details in pages, in reverse order of order time.
The table structure is streamlined and the requirements are simple. So I finished writing the code quickly and put it online for testing. Everything was running normally in the early days, but as the order volume continued to increase, the system became increasingly slow, and several慢查詢were reported from time to time.
At this point you should think that it is a LIMIT offset problem. Yes, it is not that your SQL is not beautiful enough, but the mechanism of MySQL itself.
Here I will simply take two SQL statements as an example, as shown in the following figure, which are paging from the position offset of 100 and 1 million respectively. You can see that the time difference is very large. This does not include the time for other data calculations and processing. A single SQL query takes more than one second, which is intolerable in the functions provided to users (e-commerce often requires that the RT of an interface does not exceed 200ms).

Here we look at the execution plan, as shown below:

Here we first introduce the possible values ​​and meanings of the Extra column in the execution plan:

  1. Using where: indicates that the optimizer needs to query data through the index back to the table.
  2. Using index: Covering index means that directly accessing the index is sufficient to obtain the required data without going back to the table through the index. This is usually achieved by creating a joint index for the fields to be queried.
  3. Using index condition: A new feature added after version 5.6, the famous index pushdown, is a major optimization of MySQL to減少回表次數.
  4. Using filesort: File sorting. This is usually done during ORDER BY. When the amount of data is too large, MySQL will recall all data into memory for sorting, which consumes more resources.

Looking at the picture above, the same statement has very different execution plans just because of the different offsets (allow me to exaggerate a little). In the first statement, LIMIT 100,6 the value of the type column is range , which indicates a range scan. Its performance is one level lower than that ref , but it is also considered to use the index, and index pushdown is also applied: that is, the index is deleted and selected at the order time after WHERE, and the subsequent ORDER BY is also optimized based on index pushdown, which is performed synchronously when the WHERE condition is filtered (without returning to the table).
The second statement LIMIT 1000000,6 does not use the index at all, and the value of the type column is ALL , which is obviously a full table scan. In the Extra column, Using where indicates that a table return occurs, and Using filesort indicates that a file sort occurs during ORDER BY. So there are two reasons for the slowness here: first, file sorting takes too much time, and second, after filtering the relevant data according to the conditions, it is necessary to return to the table based on the offset to obtain all the values. No matter which of the above points, it is caused by the LIMIT offset being too large, so the actual development environment often encounters the requirement that the non-statistical table level must not exceed one million.

optimization

Now that the cause has been analyzed, how can we optimize LIMIT deep paging in actual development? Here I give you two solutions.
One is through primary key index optimization . What does it mean? Just modify the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;

As shown in the above code, it is also paginated, but there is a maxId restriction. What does this mean? maxId is the maximum primary key Id in the previous page. Therefore, the premise of using this method is: 1) The primary key must be auto-incrementing and cannot be UUID, and in addition to passing the basic paging parameters pageNo, pageSize, the front end must also bring the maximum ID of each previous page, 2) This method does not support random page jumps, that is, it can only page up and down. The following figure shows an actual page from a well-known e-commerce company.

The second is through Elastic Search search engine optimization (based on inverted index). In fact, e-commerce companies like Taobao basically put all their products into the ES search engine (it is impossible to put such massive data into MySQL, and it is not realistic to put it into Redis). But even if you use the ES search engine, deep paging problems may still occur. What should you do then? The answer is through the cursor scroll. We won’t go into detail on this point here, but those who are interested can do some research.

summary

I wrote this blog because I actually experienced it during development some time ago, and I did discuss it with the interviewer during the Byte interview. Knowing the limitations and optimization of LIMIT will be a plus if you can mention it in the interview. Don't say that MySQL optimization is just about building indexes and adjusting SQL (in fact, the effects of these two optimization solutions are minimal in real development). After all, if MySQL optimization was so awesome, there wouldn’t be so many middlewares.

This is the end of this article about MySQL deep paging (how to quickly paginate tens of millions of data). For more information about MySQL deep paging, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Practical record of solving MySQL deep paging problem

<<:  Shorten the page rendering time to make the page run faster

>>:  Learn one minute a day to use Git server to view debug branches and fix them

Recommend

Interpretation of 17 advertising effectiveness measures

1. 85% of ads go unread <br />Interpretatio...

How to install MySQL database on Debian 9 system

Preface Seeing the title, everyone should be thin...

WeChat applet selects the image control

This article example shares the specific code for...

Several commonly used single-page application website sharing

CSS3Please Take a look at this website yourself, ...

Detailed explanation of CocosCreator Huarongdao digital puzzle

Table of contents Preface text 1. Panel 2. Huaron...

Vue two-choice tab bar switching new approach

Problem Description When we are working on a proj...

JavaScript to achieve full or reverse selection effect in form

This article shares the specific code of JavaScri...

Element dynamic routing breadcrumbs implementation example

To master: localStorage, component encapsulation ...

Pure CSS to achieve the list pull-down effect in the page

You may often see the following effect: That’s ri...

CSS3 uses the transition property to achieve transition effects

Detailed description of properties The purpose of...

Implementation of deploying Apollo configuration center using docker in CentOS7

Apollo open source address: https://github.com/ct...

Solution to Linux not supporting all commands

What should I do if Linux does not support all co...

Implementation of tomcat deployment project and integration with IDEA

Table of contents 3 ways to deploy projects with ...

How to configure Linux firewall and open ports 80 and 3306

Port 80 is also configured. First enter the firew...