MySQL million-level data paging query optimization solution

MySQL million-level data paging query optimization solution

When there are tens of thousands of records in the table that needs to be queried from the database, querying all the results at once will become very slow, especially as the amount of data increases. At this time, paging query is required. There are also many methods and optimization points for database paging queries. Here are some methods I know.

Preparation

In order to test some of the optimizations listed below, an existing table is described below.

Table name: order_history
Description: The order history table of a certain business. Main fields: unsigned int id, tinyint(4) int type
Field situation: The table has a total of 37 fields, excluding large arrays such as text. The maximum is varchar(500). The id field is an index and is incremented.
Data volume: 5709294
MySQL version: 5.7.16
It is not easy to find a test table with millions of data offline. If you need to test it yourself, you can write a shell script to insert data for testing.
The execution environment of all the following SQL statements has not changed. Here are the basic test results:

select count(*) from orders_history;

Return result: 5709294

The three query times are:

8903 ms
8323 ms
8401 ms

General paging query

General paging queries can be implemented using a simple limit clause. The limit clause is declared as follows:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

The LIMIT clause can be used to specify the number of records to be returned by the SELECT statement. The following points should be noted:

The first parameter specifies the offset of the first returned record row. The second parameter specifies the maximum number of returned record rows.

If only one parameter is given: it indicates the maximum number of rows to be returned. The second parameter is -1, which means to retrieve all rows from a certain offset to the end of the record set. The initial row offset is 0 (not 1).

Here is an application example:

select * from orders_history where type=8 limit 1000,10;

This statement will query the 10 records after the 1000th record from the orders_history table, that is, the 1001st to 10010th records.

By default, records in the data table are sorted by primary key (usually id). The above result is equivalent to:

select * from orders_history where type=8 order by id limit 10000,10;

The three query times are:

3040 ms
3063 ms
3018 ms

For this query method, the following tests the impact of the query record volume on time:

select * from orders_history where type=8 limit 10000,1;
select * from orders_history where type=8 limit 10000,10;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 10000,1000;
select * from orders_history where type=8 limit 10000,10000;


The three query times are as follows:

Query 1 record: 3072ms 3092ms 3002ms
Query 10 records: 3081ms 3077ms 3032ms
Query 100 records: 3118ms 3200ms 3128ms
Query 1000 records: 3412ms 3468ms 3394ms
Query 10,000 records: 3749ms 3802ms 3696ms

In addition, I also did more than ten queries. Judging from the query time, it can be basically confirmed that when the number of query records is less than 100, there is basically no difference in query time. As the number of query records increases, the time spent will also increase.

Test for query offset:

select * from orders_history where type=8 limit 100,100;
select * from orders_history where type=8 limit 1000,100;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 100000,100;
select * from orders_history where type=8 limit 1000000,100;


The three query times are as follows:

Query 100 offset: 25ms 24ms 24ms
Query 1000 offset: 78ms 76ms 77ms
Query 10000 offset: 3092ms 3212ms 3128ms
Query 100000 offset: 3878ms 3812ms 3798ms
Query 1000000 offset: 14608ms 14062ms 14700ms

As the query offset increases, especially when the query offset is greater than 100,000, the query time increases dramatically.

This paging query method starts scanning from the first record in the database, so the query speed will be slower as you go back in time. Moreover, the more data you query, the slower the overall query speed will be.

Using subquery optimization

This method first locates the id of the offset position, and then queries backwards. This method is suitable for situations where the id increases incrementally.

select * from orders_history where type=8 limit 100000,1;

select id from orders_history where type=8 limit 100000,1;

select * from orders_history where type=8 and 
id>=(select id from orders_history where type=8 limit 100000,1) 
limit 100;

select * from orders_history where type=8 limit 100000,100;

The query times for the four statements are as follows:

Statement 1: 3674ms
Statement 2: 1315ms
Statement 3: 1327ms
Statement 4: 3710ms

Note for the above query:

Compare the first statement with the second statement: using select id instead of select * increases the speed by 3 times. Compare the second statement with the third statement: the speed difference is tens of milliseconds. Compare the third statement with the fourth statement: thanks to the increase in the speed of select id, the query speed of the third statement increases by 3 times. This method will be several times faster than the original general query method.

Use id to limit optimization

This method assumes that the id of the data table is continuously increasing. Then we can calculate the range of the id to be queried based on the number of pages and the number of records queried. We can use id between and to query:

select * from orders_history where type=2 and id between 1000000 and 1000100 limit 100;

Query time: 15ms 12ms 9ms

This query method can greatly optimize the query speed and can basically be completed within tens of milliseconds. The limitation is that it can only be used when the id is clearly known. However, when creating a table, a basic id field is usually added, which brings a lot of traversal for paging queries.

There is another way to write it:

select * from orders_history where id >= 1000001 limit 100;

Of course, you can also use the in method to query, which is often used to query when multiple tables are associated, using the id set of other table queries to query:

select * from orders_history where id in (select order_id from trade_2 where goods = 'pen') limit 100;

Please note that some MySQL versions do not support the use of limit in the IN clause.

Optimize using temporary tables

This method no longer belongs to query optimization, but is mentioned here in passing.

For the problem of using ID to limit optimization, the ID needs to be continuously incremented. However, in some scenarios, such as when using historical tables, or when data missing problems occur, you can consider using a temporary storage table to record the paging ID and use the paging ID to perform in queries. This can greatly improve the speed of traditional paging queries, especially when the amount of data is tens of millions.

Description of the data table ID

Generally, when creating a table in the database, it is mandatory to add an id incrementing field to each table to facilitate querying.

If the amount of data in a database such as an order database is very large, it is usually divided into separate databases and tables. At this time, it is not recommended to use the database ID as the unique identifier. Instead, you should use a distributed high-concurrency unique ID generator to generate it, and use another field in the data table to store this unique identifier.

Using range query to locate the id (or index) first, and then using the index to locate the data, can increase the query speed several times. That is, select id first, then select *;

You may also be interested in:
  • MySQL paging query optimization techniques
  • MySQL optimization tutorial: large paging query
  • MySQL paging query method for millions of data volumes and its optimization suggestions
  • Optimizing the performance of paging query for MySQL with tens of millions of data
  • Detailed explanation of Mysql Limit paging query optimization
  • Implementation of MySQL large page query optimization for millions of data

<<:  JavaScript Design Pattern Command Pattern

>>:  VPS builds offline download server (post-network disk era)

Recommend

Detailed introduction to linux host name configuration

Table of contents 1. Configure Linux hostname Con...

How to draw special graphics in CSS

1. Triangle Border settings Code: width: 300px; h...

JS function call, apply and bind super detailed method

Table of contents JS function call, apply and bin...

Vue-Router installation process and principle detailed

Table of contents 1. Front-end routing implementa...

Introduction to the use of this in HTML tags

For example: Copy code The code is as follows: <...

Five practical tips for web form design

1. Mobile selection of form text input: In the te...

Correct use of MySQL partition tables

Overview of MySQL Partitioned Tables We often enc...

Detailed explanation of common methods of JavaScript arrays

Table of contents Common array methods pop() unsh...

Detailed explanation of the difference between $router and $route in Vue

We usually use routing in vue projects, and vue-r...

Detailed explanation of the problem when combining CSS ellipsis and padding

Text truncation with CSS Consider the following c...

Introduction to JavaScript strict mode use strict

Table of contents 1. Overview 1.1 What is strict ...

Detailed explanation of client configuration for vue3+electron12+dll development

Table of contents Modify the repository source st...

MySQL helps you understand index pushdown in seconds

Table of contents 1. The principle of index push-...

Commonplace talk about the usage of MYSQL pattern matching REGEXP and like

like LIKE requires the entire data to match, whil...