Why does using limit in MySQL affect performance?

Why does using limit in MySQL affect performance?

First, let me explain the version of MySQL:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.17 |
+-----------+
1 row in set (0.00 sec)

Table structure:

mysql> desc test;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| val | int(10) unsigned | NO | MUL | 0 | |
| source | int(10) unsigned | NO | | 0 | |
+--------+---------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

id is the auto-increment primary key and val is a non-unique index.

Pour in a large amount of data, a total of 5 million:

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|5242882|
+----------+
1 row in set (4.25 sec)

We know that when offset in limit offset rows is large, efficiency problems will occur:

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (15.98 sec)

In order to achieve the same purpose, we usually rewrite it as follows:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.38 sec)

The time difference is obvious.

Why does the above result appear? Let's take a look at the query process of select * from test where val=4 limit 300000,5;

The index leaf node data is queried.
Query all required field values ​​on the clustered index based on the primary key value on the leaf node.

Similar to the following picture:

As shown above, you need to query the index node 300005 times, query the clustered index data 300005 times, and finally filter out the first 300000 results and take out the last 5. MySQL spends a lot of random I/O to query the data of the clustered index, and the data queried by 300000 random I/O will not appear in the result set.

Someone will definitely ask: Since the index is used at the beginning, why not first query along the index leaf nodes to the last 5 nodes required, and then query the actual data in the clustered index. This only requires 5 random I/Os, similar to the process in the following picture:

Confirmed:

Let's actually do some operations to confirm the above inference:

In order to prove select * from test where val=4 limit 300000 , 5 scans 300005 index nodes and 300005 data nodes on the clustered index, we need to know whether MySQL has a way to count the number of times data nodes are queried through index nodes in one SQL. I tried Handler_read_* series first, but unfortunately none of the variables met the conditions.

I can only confirm this indirectly:

InnoDB has buffer pool . It contains the most recently accessed data pages, including data pages and index pages. So we need to run two SQL statements to compare the number of data pages in buffer pool . The prediction result is that after running select * from test a inner join (select id from test where val=4 limit 300000,5) b> , the number of data pages in buffer pool is much less than the corresponding number of select * from test where val=4 limit 300000, 5;, because the former sql only accesses the data page 5 times, while the latter sql accesses the data page 300005 times.

select * from test where val=4 limit 300000,5

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.04 sec)

It can be seen that there is currently no data page about the test table in buffer pool .

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (26.19 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 4098 |
| val | 208 |
+------------+----------+
2 rows in set (0.04 sec)

It can be seen that at this time there are 4098 data pages and 208 index pages for the test table in buffer pool .

select * from test a inner join (select id from test where val=4 limit 300000,5) b> , we need to clear buffer pool and restart mysql。

mysqladmin shutdown
/usr/local/bin/mysqld_safe &
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.03 sec)

Run sql:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.09 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 5 |
| val | 390 |
+------------+----------+
2 rows in set (0.03 sec)

We can clearly see the difference between the two: the first sql loads 4098 data pages into buffer pool , while the second sql only loads 5 data pages into buffer pool。 In line with our prediction. This also confirms why the first SQL statement is slow: it reads a large number of useless data rows (300,000) and then discards them.

And this will cause a problem: loading a lot of data pages that are not very hot into buffer pool will cause buffer pool pollution and occupy buffer pool space.

Problems encountered:

To ensure that buffer pool is cleared at each restart, we need to turn off innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup . These two options control the dumping of buffer pool data when the database is shut down and the loading of the backup buffer pool data on disk when the database is started.

This concludes this article on why using limit in MySQL affects performance. For more information on the performance impact of using limit in MySQL, please search previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Solution to data duplication when using limit+order by in MySql paging
  • Why does MySQL paging become slower and slower when using limit?
  • MySQL optimization query_cache_limit parameter description
  • Detailed explanation of the pitfalls of mixing MySQL order by and limit
  • Reasons and optimization solutions for slow MySQL limit paging with large offsets
  • Mysql sorting and paging (order by & limit) and existing pitfalls
  • MySQL uses limit to implement paging example method
  • How to use MySQL limit and solve the problem of large paging
  • A brief discussion on the performance issues of MySQL paging limit
  • MySQL limit performance analysis and optimization
  • Analysis of the Poor Performance Caused by Large Offset of LIMIT in MySQL Query

<<:  zabbix custom monitoring nginx status implementation process

>>:  Summary of events that browsers can register

Recommend

Next.js Getting Started Tutorial

Table of contents Introduction Create a Next.js p...

Some improvements in MySQL 8.0.24 Release Note

Table of contents 1. Connection Management 2. Imp...

Summary of three rules for React state management

Table of contents Preface No.1 A focus No.2 Extra...

Detailed explanation of three ways to connect Docker containers to each other

There are three ways to interconnect and communic...

Detailed explanation of Redis master-slave replication practice using Docker

Table of contents 1. Background 2. Operation step...

Detailed explanation of MySQL index selection and optimization

Table of contents Index Model B+Tree Index select...

CSS to achieve compatible text alignment in different browsers

In the front-end layout of the form, we often nee...

Fixed a bug caused by scrollbar occupying space

background This bug was caused by滾動條占據空間. I check...

Detailed steps for developing Java payment interface for Alipay

Table of contents first step Step 2 Step 3 Step 4...

A detailed tutorial on using Docker to build a complete development environment

Introduction to DNMP DNMP (Docker + Nginx + MySQL...

A complete list of commonly used MySQL functions (classified and summarized)

1. Mathematical Functions ABS(x) returns the abso...

HTML table markup tutorial (4): border color attribute BORDERCOLOR

To beautify the table, you can set different bord...

An article to quickly understand Angular and Ionic life cycle and hook functions

Table of contents Angular accomplish Calling orde...