How to quickly paginate MySQL data volumes of tens of millions

How to quickly paginate MySQL data volumes of tens of millions

Preface

In backend development, in order to prevent too much data from being loaded at one time, resulting in excessive memory and disk IO overhead, paging display is often required. At this time, the LIMIT keyword of MySQL is needed. But do you think that everything will be fine with LIMIT paging? It’s too young and too simple. When the amount of data is large, one problem that LIMIT is likely to cause is deep paging.

Case

Here I take the display of e-commerce order details as an example, and the new table is as follows:

CREATE TABLE `cps_user_order_detail` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
 `user_id` varchar(32) NOT NULL DEFAULT '' COMMENT 'User ID',
 `order_id` bigint(20) DEFAULT NULL COMMENT 'Order id',
 `sku_id` bigint(20) unsigned NOT NULL COMMENT 'Product ID',
 `order_time` datetime DEFAULT NULL COMMENT 'Order time, format yyyy-MM-dd HH:mm:ss',
 PRIMARY KEY (`id`),
 KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User order details';

Then manually insert 1.2 million records into the table.

Now there is a requirement: display the user's order details in pages, in reverse order of order time.

The table structure is streamlined and the requirements are simple. So I finished writing the code quickly and put it online for testing. Everything was running normally in the early days, but as the order volume continued to increase, the system became increasingly slow, and several slow queries were reported from time to time.

At this point you should think that it is a LIMIT offset problem. Yes, it is not that your SQL is not beautiful enough, but the mechanism of MySQL itself.

Here I will simply take two SQL statements as an example, as shown in the following figure, which are paging from the position offset of 100 and 1 million respectively. You can see that the time difference is very large. This does not include the time for other data calculations and processing. A single SQL query takes more than one second, which is intolerable in the functions provided to users (e-commerce often requires that the RT of an interface does not exceed 200ms).

Here we look at the execution plan, as shown below:

Here we first introduce the possible values ​​and meanings of the Extra column in the execution plan:

  1. Using where: indicates that the optimizer needs to query data through the index back to the table.
  2. Using index: Covering index means that directly accessing the index is sufficient to obtain the required data without going back to the table through the index. This is usually achieved by creating a joint index for the fields to be queried.
  3. Using index condition: A new feature added after version 5.6, the famous index pushdown, is a major optimization of MySQL to reduce the number of table returns.
  4. Using filesort: File sorting. This is usually done during ORDER BY. When the amount of data is too large, MySQL will recall all data into memory for sorting, which consumes more resources.

Looking at the picture above, the same statement, just because of the different offsets, results in a huge difference in the execution plan (and allow me to exaggerate a little). In the first statement, the value of the LIMIT 100,6type column is range, which indicates a range scan. Its performance is one level lower than that of ref, but it is also considered to use the index, and index pushdown is also applied: that is, the index is deleted and selected at the order time after WHERE, and the subsequent ORDER BY is also optimized based on index pushdown, which is performed synchronously when the WHERE condition is filtered (without returning to the table).
The second statement LIMIT 1000000,6 does not use the index at all, and the value of the type column is ALL, which is obviously a full table scan. In the Extra column, Using where indicates that a table return occurs, and Using filesort indicates that a file sort occurs during ORDER BY. So there are two reasons for the slowness here: first, file sorting takes too much time, and second, after filtering the relevant data according to the conditions, it is necessary to return to the table based on the offset to obtain all the values. No matter which of the above points, it is caused by the LIMIT offset being too large, so the actual development environment often encounters the requirement that the non-statistical table level must not exceed one million.

optimization

Now that the cause has been analyzed, how can we optimize LIMIT deep paging in actual development? Here I give you two solutions.
One is through primary key index optimization. What does it mean? Just modify the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;

As shown in the above code, it is also paginated, but there is a maxId restriction. What does this mean? maxId is the maximum primary key Id in the previous page. Therefore, the premise of using this method is: 1) The primary key must be auto-incrementing and cannot be UUID, and in addition to passing the basic paging parameters pageNo, pageSize, the front end must also bring the maximum ID of each previous page, 2) This method does not support random page jumps, that is, it can only page up and down. The following figure shows an actual page from a well-known e-commerce company.

The second is through the Elastic Search search engine (based on inverted index). In fact, e-commerce companies like Taobao basically put all their products into the ES search engine (it is impossible to put such massive data into MySQL, and it is not realistic to put it into Redis). But even if you use the ES search engine, deep paging problems may still occur. What should you do then? The answer is through the cursor scroll. We won’t go into detail on this point here, but those who are interested can do some research.

summary

I wrote this blog because I actually experienced it during development some time ago, and I did discuss it with the interviewer during the Byte interview. Knowing the limitations and optimization of LIMIT will be a plus if you can mention it in the interview. Don't say that MySQL optimization is just about building indexes and adjusting SQL (in fact, the effects of these two optimization solutions are minimal in real development). After all, if MySQL optimization was so awesome, there wouldn’t be so many middlewares.
I am Shaoxia Lu Fei, I love technology and I love sharing.

The above is the details of how to quickly paginate tens of millions of data in MySQL. For more information about MySQL fast paging, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL million-level data paging query optimization solution
  • Optimizing the performance of paging query for MySQL with tens of millions of data
  • MSSQL MySQL database paging (stored procedure)
  • MySQL paging query method for millions of data volumes and its optimization suggestions
  • MySQL single table million data records paging performance optimization skills
  • MySQL learning notes: data definition table constraints, paging method summary
  • How to query data from multiple unrelated tables and paging in Mysql
  • A brief discussion on the optimization of MySQL paging for billions of data

<<:  How to install and configure GitLab on Ubuntu 20.04

>>:  js to achieve a simple lottery function

Recommend

Bootstrap3.0 study notes table related

This article mainly explains tables, which are no...

Some wonderful uses of URL objects in JavaScript

Table of contents Preface Parsing parameters Modi...

Analysis of MySQL Aborted connection warning log

Preface: Sometimes, the session connected to MySQ...

Will css loading cause blocking?

Maybe everyone knows that js execution will block...

CentOS installation mysql5.7 detailed tutorial

This article shares the detailed steps of install...

The button has a gray border that is ugly. How to remove it?

I used the dialog in closure and drew a dialog wit...

The Complete List of MIME Types

What is MIME TYPE? 1. First, we need to understan...

How to use crontab to backup MySQL database regularly in Linux system

Use the system crontab to execute backup files re...

Make your website automatically use IE7 compatibility mode when browsing IE8

Preface To help ensure that your web pages have a ...

How to use react-color to implement the front-end color picker

background We can use react-color to implement th...

Solution to nacos not being able to connect to mysql

reason The mysql version that nacos's pom dep...

A brief discussion of several browser compatibility issues encountered

background Solving browser compatibility issues i...

Solution to the problem of invalid width setting for label and span

By default, setting width for label and span is in...