How to quickly paginate MySQL data volumes of tens of millions

How to quickly paginate MySQL data volumes of tens of millions

Preface

In backend development, in order to prevent too much data from being loaded at one time, resulting in excessive memory and disk IO overhead, paging display is often required. At this time, the LIMIT keyword of MySQL is needed. But do you think that everything will be fine with LIMIT paging? It’s too young and too simple. When the amount of data is large, one problem that LIMIT is likely to cause is deep paging.

Case

Here I take the display of e-commerce order details as an example, and the new table is as follows:

CREATE TABLE `cps_user_order_detail` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
 `user_id` varchar(32) NOT NULL DEFAULT '' COMMENT 'User ID',
 `order_id` bigint(20) DEFAULT NULL COMMENT 'Order id',
 `sku_id` bigint(20) unsigned NOT NULL COMMENT 'Product ID',
 `order_time` datetime DEFAULT NULL COMMENT 'Order time, format yyyy-MM-dd HH:mm:ss',
 PRIMARY KEY (`id`),
 KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User order details';

Then manually insert 1.2 million records into the table.

Now there is a requirement: display the user's order details in pages, in reverse order of order time.

The table structure is streamlined and the requirements are simple. So I finished writing the code quickly and put it online for testing. Everything was running normally in the early days, but as the order volume continued to increase, the system became increasingly slow, and several slow queries were reported from time to time.

At this point you should think that it is a LIMIT offset problem. Yes, it is not that your SQL is not beautiful enough, but the mechanism of MySQL itself.

Here I will simply take two SQL statements as an example, as shown in the following figure, which are paging from the position offset of 100 and 1 million respectively. You can see that the time difference is very large. This does not include the time for other data calculations and processing. A single SQL query takes more than one second, which is intolerable in the functions provided to users (e-commerce often requires that the RT of an interface does not exceed 200ms).

Here we look at the execution plan, as shown below:

Here we first introduce the possible values ​​and meanings of the Extra column in the execution plan:

  1. Using where: indicates that the optimizer needs to query data through the index back to the table.
  2. Using index: Covering index means that directly accessing the index is sufficient to obtain the required data without going back to the table through the index. This is usually achieved by creating a joint index for the fields to be queried.
  3. Using index condition: A new feature added after version 5.6, the famous index pushdown, is a major optimization of MySQL to reduce the number of table returns.
  4. Using filesort: File sorting. This is usually done during ORDER BY. When the amount of data is too large, MySQL will recall all data into memory for sorting, which consumes more resources.

Looking at the picture above, the same statement, just because of the different offsets, results in a huge difference in the execution plan (and allow me to exaggerate a little). In the first statement, the value of the LIMIT 100,6type column is range, which indicates a range scan. Its performance is one level lower than that of ref, but it is also considered to use the index, and index pushdown is also applied: that is, the index is deleted and selected at the order time after WHERE, and the subsequent ORDER BY is also optimized based on index pushdown, which is performed synchronously when the WHERE condition is filtered (without returning to the table).
The second statement LIMIT 1000000,6 does not use the index at all, and the value of the type column is ALL, which is obviously a full table scan. In the Extra column, Using where indicates that a table return occurs, and Using filesort indicates that a file sort occurs during ORDER BY. So there are two reasons for the slowness here: first, file sorting takes too much time, and second, after filtering the relevant data according to the conditions, it is necessary to return to the table based on the offset to obtain all the values. No matter which of the above points, it is caused by the LIMIT offset being too large, so the actual development environment often encounters the requirement that the non-statistical table level must not exceed one million.

optimization

Now that the cause has been analyzed, how can we optimize LIMIT deep paging in actual development? Here I give you two solutions.
One is through primary key index optimization. What does it mean? Just modify the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;

As shown in the above code, it is also paginated, but there is a maxId restriction. What does this mean? maxId is the maximum primary key Id in the previous page. Therefore, the premise of using this method is: 1) The primary key must be auto-incrementing and cannot be UUID, and in addition to passing the basic paging parameters pageNo, pageSize, the front end must also bring the maximum ID of each previous page, 2) This method does not support random page jumps, that is, it can only page up and down. The following figure shows an actual page from a well-known e-commerce company.

The second is through the Elastic Search search engine (based on inverted index). In fact, e-commerce companies like Taobao basically put all their products into the ES search engine (it is impossible to put such massive data into MySQL, and it is not realistic to put it into Redis). But even if you use the ES search engine, deep paging problems may still occur. What should you do then? The answer is through the cursor scroll. We won’t go into detail on this point here, but those who are interested can do some research.

summary

I wrote this blog because I actually experienced it during development some time ago, and I did discuss it with the interviewer during the Byte interview. Knowing the limitations and optimization of LIMIT will be a plus if you can mention it in the interview. Don't say that MySQL optimization is just about building indexes and adjusting SQL (in fact, the effects of these two optimization solutions are minimal in real development). After all, if MySQL optimization was so awesome, there wouldn’t be so many middlewares.
I am Shaoxia Lu Fei, I love technology and I love sharing.

The above is the details of how to quickly paginate tens of millions of data in MySQL. For more information about MySQL fast paging, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL million-level data paging query optimization solution
  • Optimizing the performance of paging query for MySQL with tens of millions of data
  • MSSQL MySQL database paging (stored procedure)
  • MySQL paging query method for millions of data volumes and its optimization suggestions
  • MySQL single table million data records paging performance optimization skills
  • MySQL learning notes: data definition table constraints, paging method summary
  • How to query data from multiple unrelated tables and paging in Mysql
  • A brief discussion on the optimization of MySQL paging for billions of data

<<:  How to install and configure GitLab on Ubuntu 20.04

>>:  js to achieve a simple lottery function

Recommend

Element UI table realizes drop-down filtering function

This article example shares the specific code for...

Detailed explanation of non-parent-child component value transfer in Vue3

Table of contents App.vue sub1.vue sub2.vue Summa...

Basic knowledge points of mysql worm replication

Worms replicate, as the name implies, by themselv...

Image scrolling effect made with CSS3

Achieve resultsImplementation Code html <base ...

3 methods to restore table structure from frm file in mysql [recommended]

When mysql is running normally, it is not difficu...

Solve the problem of yum installation error Protected multilib versions

Today, when installing nginx on the cloud server,...

Detailed installation and use of docker-compose

Docker Compose is a Docker tool for defining and ...

Nginx content cache and common parameter configuration details

Use scenarios: The project's pages need to lo...

How to use webSocket to update real-time weather in Vue

Table of contents Preface About webSocket operati...

MySQL 8.0.21 installation steps and problem solutions

Download the official website First go to the off...

jQuery plugin to achieve code rain effect

This article shares the specific code of the jQue...

How to view and modify the time zone in MySQL

Today I found that a program inserted an incorrec...

Solution to win10 without Hyper-V

Are you still looking for a way to enable Hyper-v...

How to add a certificate to docker

1. Upgrade process: sudo apt-get update Problems ...

How to add abort function to promise in JS

Table of contents Overview Promise Race Method Re...