MySQL deep paging (how to quickly paginate tens of millions of data)

MySQL deep paging (how to quickly paginate tens of millions of data)

Preface

In backend development, in order to prevent too much data from being loaded at one time, resulting in excessive memory and disk IO overhead, paging display is often required. At this time, the LIMIT keyword of MySQL is needed. But do you think that everything will be fine with LIMIT paging? It’s too young and too simple. When the amount of data is large, one problem that LIMIT is likely to cause is deep paging.

Case

Here I take the display of e-commerce order details as an example, and the new table is as follows:

CREATE TABLE `cps_user_order_detail` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  `user_id` varchar(32) NOT NULL DEFAULT '' COMMENT 'User ID',
  `order_id` bigint(20) DEFAULT NULL COMMENT 'Order id',
  `sku_id` bigint(20) unsigned NOT NULL COMMENT 'Product ID',
  `order_time` datetime DEFAULT NULL COMMENT 'Order time, format yyyy-MM-dd HH:mm:ss',
   PRIMARY KEY (`id`),
   KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User order details';

Then manually insert 1.2 million records into the table.
Now there is a requirement: display the user's order details in pages, in reverse order of order time.
The table structure is streamlined and the requirements are simple. So I finished writing the code quickly and put it online for testing. Everything was running normally in the early days, but as the order volume continued to increase, the system became increasingly slow, and several慢查詢were reported from time to time.
At this point you should think that it is a LIMIT offset problem. Yes, it is not that your SQL is not beautiful enough, but the mechanism of MySQL itself.
Here I will simply take two SQL statements as an example, as shown in the following figure, which are paging from the position offset of 100 and 1 million respectively. You can see that the time difference is very large. This does not include the time for other data calculations and processing. A single SQL query takes more than one second, which is intolerable in the functions provided to users (e-commerce often requires that the RT of an interface does not exceed 200ms).

Here we look at the execution plan, as shown below:

Here we first introduce the possible values ​​and meanings of the Extra column in the execution plan:

  1. Using where: indicates that the optimizer needs to query data through the index back to the table.
  2. Using index: Covering index means that directly accessing the index is sufficient to obtain the required data without going back to the table through the index. This is usually achieved by creating a joint index for the fields to be queried.
  3. Using index condition: A new feature added after version 5.6, the famous index pushdown, is a major optimization of MySQL to減少回表次數.
  4. Using filesort: File sorting. This is usually done during ORDER BY. When the amount of data is too large, MySQL will recall all data into memory for sorting, which consumes more resources.

Looking at the picture above, the same statement has very different execution plans just because of the different offsets (allow me to exaggerate a little). In the first statement, LIMIT 100,6 the value of the type column is range , which indicates a range scan. Its performance is one level lower than that ref , but it is also considered to use the index, and index pushdown is also applied: that is, the index is deleted and selected at the order time after WHERE, and the subsequent ORDER BY is also optimized based on index pushdown, which is performed synchronously when the WHERE condition is filtered (without returning to the table).
The second statement LIMIT 1000000,6 does not use the index at all, and the value of the type column is ALL , which is obviously a full table scan. In the Extra column, Using where indicates that a table return occurs, and Using filesort indicates that a file sort occurs during ORDER BY. So there are two reasons for the slowness here: first, file sorting takes too much time, and second, after filtering the relevant data according to the conditions, it is necessary to return to the table based on the offset to obtain all the values. No matter which of the above points, it is caused by the LIMIT offset being too large, so the actual development environment often encounters the requirement that the non-statistical table level must not exceed one million.

optimization

Now that the cause has been analyzed, how can we optimize LIMIT deep paging in actual development? Here I give you two solutions.
One is through primary key index optimization . What does it mean? Just modify the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;

As shown in the above code, it is also paginated, but there is a maxId restriction. What does this mean? maxId is the maximum primary key Id in the previous page. Therefore, the premise of using this method is: 1) The primary key must be auto-incrementing and cannot be UUID, and in addition to passing the basic paging parameters pageNo, pageSize, the front end must also bring the maximum ID of each previous page, 2) This method does not support random page jumps, that is, it can only page up and down. The following figure shows an actual page from a well-known e-commerce company.

The second is through Elastic Search search engine optimization (based on inverted index). In fact, e-commerce companies like Taobao basically put all their products into the ES search engine (it is impossible to put such massive data into MySQL, and it is not realistic to put it into Redis). But even if you use the ES search engine, deep paging problems may still occur. What should you do then? The answer is through the cursor scroll. We won’t go into detail on this point here, but those who are interested can do some research.

summary

I wrote this blog because I actually experienced it during development some time ago, and I did discuss it with the interviewer during the Byte interview. Knowing the limitations and optimization of LIMIT will be a plus if you can mention it in the interview. Don't say that MySQL optimization is just about building indexes and adjusting SQL (in fact, the effects of these two optimization solutions are minimal in real development). After all, if MySQL optimization was so awesome, there wouldn’t be so many middlewares.

This is the end of this article about MySQL deep paging (how to quickly paginate tens of millions of data). For more information about MySQL deep paging, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Practical record of solving MySQL deep paging problem

<<:  Shorten the page rendering time to make the page run faster

>>:  Learn one minute a day to use Git server to view debug branches and fix them

Recommend

CentOS 7 configuration Tomcat9+MySQL solution

Configure Tomcat First install Tomcat Installing ...

HTML simple shopping quantity applet

This article shares a simple HTML shopping quanti...

Implementation of CSS heart-shaped loading animation source code

Without further ado, let me show you the code. Th...

10 tips for designing useful, easy-to-use web applications

Here are 10 tips on how to design better-usable w...

Zabbix monitors Linux hosts based on snmp

Preface: The Linux host is relatively easy to han...

Introduction to container of() function in Linux kernel programming

Preface In Linux kernel programming, you will oft...

Install multiple versions of PHP for Nginx on Linux

When we install and configure the server LNPM env...

Why does MySQL database index choose to use B+ tree?

Before further analyzing why MySQL database index...

Diagram of the process of implementing direction proxy through nginx

This article mainly introduces the process of imp...

Cross-domain issues in front-end and back-end separation of Vue+SpringBoot

In the front-end and back-end separation developm...

Vue+Openlayer realizes the dragging and rotation deformation effect of graphics

Table of contents Preface Related Materials Achie...

MySQL database master-slave replication and read-write separation

Table of contents 1. Master-slave replication Mas...