MySQL deep paging (how to quickly paginate tens of millions of data)

Preface
Case
optimization
summary

Preface

In backend development, in order to prevent too much data from being loaded at one time, resulting in excessive memory and disk IO overhead, paging display is often required. At this time, the LIMIT keyword of MySQL is needed. But do you think that everything will be fine with LIMIT paging? It’s too young and too simple. When the amount of data is large, one problem that LIMIT is likely to cause is deep paging.

Case

Here I take the display of e-commerce order details as an example, and the new table is as follows:

CREATE TABLE `cps_user_order_detail` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  `user_id` varchar(32) NOT NULL DEFAULT '' COMMENT 'User ID',
  `order_id` bigint(20) DEFAULT NULL COMMENT 'Order id',
  `sku_id` bigint(20) unsigned NOT NULL COMMENT 'Product ID',
  `order_time` datetime DEFAULT NULL COMMENT 'Order time, format yyyy-MM-dd HH:mm:ss',
   PRIMARY KEY (`id`),
   KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User order details';

Then manually insert 1.2 million records into the table.
Now there is a requirement: display the user's order details in pages, in reverse order of order time.
The table structure is streamlined and the requirements are simple. So I finished writing the code quickly and put it online for testing. Everything was running normally in the early days, but as the order volume continued to increase, the system became increasingly slow, and several慢查詢were reported from time to time.
At this point you should think that it is a LIMIT offset problem. Yes, it is not that your SQL is not beautiful enough, but the mechanism of MySQL itself.
Here I will simply take two SQL statements as an example, as shown in the following figure, which are paging from the position offset of 100 and 1 million respectively. You can see that the time difference is very large. This does not include the time for other data calculations and processing. A single SQL query takes more than one second, which is intolerable in the functions provided to users (e-commerce often requires that the RT of an interface does not exceed 200ms).

Here we look at the execution plan, as shown below:

Here we first introduce the possible values and meanings of the Extra column in the execution plan:

Using where: indicates that the optimizer needs to query data through the index back to the table.
Using index: Covering index means that directly accessing the index is sufficient to obtain the required data without going back to the table through the index. This is usually achieved by creating a joint index for the fields to be queried.
Using index condition: A new feature added after version 5.6, the famous index pushdown, is a major optimization of MySQL to減少回表次數.
Using filesort: File sorting. This is usually done during ORDER BY. When the amount of data is too large, MySQL will recall all data into memory for sorting, which consumes more resources.

Looking at the picture above, the same statement has very different execution plans just because of the different offsets (allow me to exaggerate a little). In the first statement, LIMIT 100,6 the value of the type column is range , which indicates a range scan. Its performance is one level lower than that ref , but it is also considered to use the index, and index pushdown is also applied: that is, the index is deleted and selected at the order time after WHERE, and the subsequent ORDER BY is also optimized based on index pushdown, which is performed synchronously when the WHERE condition is filtered (without returning to the table).
The second statement LIMIT 1000000,6 does not use the index at all, and the value of the type column is ALL , which is obviously a full table scan. In the Extra column, Using where indicates that a table return occurs, and Using filesort indicates that a file sort occurs during ORDER BY. So there are two reasons for the slowness here: first, file sorting takes too much time, and second, after filtering the relevant data according to the conditions, it is necessary to return to the table based on the offset to obtain all the values. No matter which of the above points, it is caused by the LIMIT offset being too large, so the actual development environment often encounters the requirement that the non-statistical table level must not exceed one million.

optimization

Now that the cause has been analyzed, how can we optimize LIMIT deep paging in actual development? Here I give you two solutions.
One is through primary key index optimization . What does it mean? Just modify the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;

As shown in the above code, it is also paginated, but there is a maxId restriction. What does this mean? maxId is the maximum primary key Id in the previous page. Therefore, the premise of using this method is: 1) The primary key must be auto-incrementing and cannot be UUID, and in addition to passing the basic paging parameters pageNo, pageSize, the front end must also bring the maximum ID of each previous page, 2) This method does not support random page jumps, that is, it can only page up and down. The following figure shows an actual page from a well-known e-commerce company.

The second is through Elastic Search search engine optimization (based on inverted index). In fact, e-commerce companies like Taobao basically put all their products into the ES search engine (it is impossible to put such massive data into MySQL, and it is not realistic to put it into Redis). But even if you use the ES search engine, deep paging problems may still occur. What should you do then? The answer is through the cursor scroll. We won’t go into detail on this point here, but those who are interested can do some research.

summary

I wrote this blog because I actually experienced it during development some time ago, and I did discuss it with the interviewer during the Byte interview. Knowing the limitations and optimization of LIMIT will be a plus if you can mention it in the interview. Don't say that MySQL optimization is just about building indexes and adjusting SQL (in fact, the effects of these two optimization solutions are minimal in real development). After all, if MySQL optimization was so awesome, there wouldn’t be so many middlewares.

This is the end of this article about MySQL deep paging (how to quickly paginate tens of millions of data). For more information about MySQL deep paging, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

Practical record of solving MySQL deep paging problem

<<: Shorten the page rendering time to make the page run faster

>>: Learn one minute a day to use Git server to view debug branches and fix them

How to install and configure Docker nginx

MySQL deep paging (how to quickly paginate tens of millions of data)

Table of contents

Preface

Case

optimization

summary

How to install and configure Docker nginx

js realizes two-way data binding (accessor monitoring)

Express implements login verification

Two box models in web pages (W3C box model, IE box model)

How to configure port forwarding for docker on CentOS 7 to be compatible with firewall

Example code showing common graphic effects in CSS styles

Implementation of Vue 3.x project based on Vite2.x

Common considerations for building a Hadoop 3.2.0 cluster

MySQL merge and split by specified characters example tutorial

MySQL query optimization: causes and solutions for slow queries

Recommend

CentOS 7 configuration Tomcat9+MySQL solution

Detailed explanation of the role and working principle of MySQL master-slave replication

HTML simple shopping quantity applet

Implementation of CSS heart-shaped loading animation source code

10 tips for designing useful, easy-to-use web applications

Problems encountered when installing mysql-8.0.19-winx64: Can't create directory 'xxxx\Database\'

Zabbix monitors Linux hosts based on snmp

Introduction to container of() function in Linux kernel programming

Summary of knowledge points related to null (IFNULL, COALESCE and NULLIF) in MySQL

Install multiple versions of PHP for Nginx on Linux

Why does MySQL database index choose to use B+ tree?

Diagram of the process of implementing direction proxy through nginx

Cross-domain issues in front-end and back-end separation of Vue+SpringBoot

Vue+Openlayer realizes the dragging and rotation deformation effect of graphics

MySQL database master-slave replication and read-write separation