The impact of limit on query performance in MySQL

The impact of limit on query performance in MySQL

I. Introduction

First, let me explain the version of MySQL:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.17 |
+-----------+
1 row in set (.00 sec)

Table structure:

mysql> desc test;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| val | int(10) unsigned | NO | MUL | | |
| source | int(10) unsigned | NO | | | |
+--------+---------------------+------+-----+---------+----------------+
3 rows in set (.00 sec)

id is the auto-increment primary key and val is a non-unique index.

Pour in a large amount of data, a total of 5 million:

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|5242882|
+----------+
1 row in set (4.25 sec)

We know that when the offset in the limit offset rows is large, efficiency problems will occur:

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (15.98 sec)

In order to achieve the same purpose, we usually rewrite it as follows:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (.38 sec)

The time difference is obvious.

Why does the above result appear? Let's take a look at the query process of select * from test where val=4 limit 300000,5;:

The index leaf node data is queried.

Query all required field values ​​on the clustered index based on the primary key value on the leaf node.

Similar to the following picture:

As shown above, you need to query the index node 300,005 times, query the clustered index data 300,005 times, and finally filter out the first 300,000 results and take out the last 5. MySQL spends a lot of random I/O to query the data of the clustered index, and the data queried by 300,000 random I/Os will not appear in the result set.

Someone will definitely ask: Since the index is used at the beginning, why not first query along the index leaf nodes to the last 5 nodes required, and then query the actual data in the clustered index. This only requires 5 random I/Os, similar to the process in the following picture:

Actually, I want to ask this question too.

Confirmation

Let's actually do some operations to confirm the above inference:

In order to prove select * from test where val=4 limit 300000,5 scans 300005 index nodes and 300005 data nodes on the clustered index, we need to know whether MySQL has a way to count the number of times data nodes are queried through index nodes in one SQL. I tried the Handler_read_* series first, but unfortunately none of the variables met the conditions.

I can only confirm this indirectly:

InnoDB has a buffer pool. It contains the most recently accessed data pages, including data pages and index pages. So we need to run two SQL statements to compare the number of data pages in the buffer pool. The prediction result is that after running select * from test a inner join (select id from test where val=4 limit 300000,5) , the number of data pages in the buffer pool is much less than the corresponding number of select * from test where val=4 limit 300000,5 ;, because the former sql only accesses the data page 5 times, while the latter sql accesses the data page 300005 times.

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (.04 sec)

It can be seen that there is currently no data page about the test table in the buffer pool.

mysql> select * from test where val=4 limit 300000,5;
+---------+-----+--------+
| id | val | source |
+---------+-----+--------+
| 3327622 | 4 | 4 |
| 3327632 | 4 | 4 |
| 3327642 | 4 | 4 |
| 3327652 | 4 | 4 |
| 3327662 | 4 | 4 |
+---------+-----+--------+
5 rows in set (26.19 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 4098 |
| val | 208 |
+------------+----------+
2 rows in set (.04 sec)

It can be seen that at this time there are 4098 data pages and 208 index pages for the test table in the buffer pool.

select * from test a inner join (select id from test where val=4 limit 300000,5) , we need to clear the buffer pool and restart MySQL.

mysqladmin shutdown
/usr/local/bin/mysqld_safe &
mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
Empty set (0.03 sec)

Run SQL:

mysql> select * from test a inner join (select id from test where val=4 limit 300000,5) b on a.id=b.id;
+---------+-----+--------+---------+
| id | val | source | id |
+---------+-----+--------+---------+
| 3327622 | 4 | 4 | 3327622 |
| 3327632 | 4 | 4 | 3327632 |
| 3327642 | 4 | 4 | 3327642 |
| 3327652 | 4 | 4 | 3327652 |
| 3327662 | 4 | 4 | 3327662 |
+---------+-----+--------+---------+
5 rows in set (0.09 sec)

mysql> select index_name,count(*) from information_schema.INNODB_BUFFER_PAGE where INDEX_NAME in('val','primary') and TABLE_NAME like '%test%' group by index_name;
+------------+----------+
| index_name | count(*) |
+------------+----------+
| PRIMARY | 5 |
| val | 390 |
+------------+----------+
2 rows in set (0.03 sec)

We can clearly see the difference between the two: the first sql loads 4098 data pages into the buffer pool, while the second sql only loads 5 data pages into the buffer pool. In line with our prediction. This also confirms why the first SQL statement is slow: it reads a large number of useless data rows (300,000) and then discards them.

And this will cause a problem: loading a lot of data pages that are not very hot into the buffer pool will cause buffer pool pollution and occupy buffer pool space.

Problems encountered

To ensure that the buffer pool is cleared at each restart, we need to turn off innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup. These two options control the dumping of buffer pool data when the database is shut down and the loading of backup buffer pool data on disk when the database is started.

References:

1. https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/

2. https://dev.mysql.com/doc/refman/5.7/en/innodb-information-schema-buffer-pool-tables.html

For more information about SQL execution efficiency, please see the following related articles

You may also be interested in:
  • A brief discussion on MySQL select optimization solution
  • MySQL select results to perform update example tutorial
  • Solve the problem that MySQL read-write separation causes data not to be selected after insert
  • How MySQL Select Statement is Executed
  • Detailed example of using the distinct method in MySQL
  • Should I use distinct or group by to remove duplicates in MySQL?
  • The difference between distinct and group by in MySQL
  • Let's talk about the LIMIT statement in MySQL in detail
  • MySQL series tutorial on understanding the use of union (all) and limit and exists keywords
  • Use of select, distinct, and limit in MySQL

<<:  In-depth understanding of Vue's method of generating QR codes using vue-qr

>>:  Use iframe to display weather effects on web pages

Recommend

Detailed explanation of how to view the number of MySQL server threads

This article uses an example to describe how to v...

Django2.* + Mysql5.7 development environment integration tutorial diagram

environment: MAC_OS 10.12 Python 3.6 mysql 5.7.25...

How to implement Ajax concurrent request control based on JS

Table of contents Preface Ajax serial and paralle...

An article to help you understand Js inheritance and prototype chain

Table of contents Inheritance and prototype chain...

How to install Composer in Linux

1. Download the installation script - composer-se...

Hexadecimal color codes (full)

Red and pink, and their hexadecimal codes. #99003...

Causes and solutions for front-end exception 502 bad gateway

Table of contents 502 bad gateway error formation...

Basic usage of find_in_set function in mysql

Preface This is a new function I came across rece...

What are mysql dirty pages?

Table of contents Dirty pages (memory pages) Why ...

Flash embedded in web pages and IE, FF, Maxthon compatibility issues

After going through a lot of hardships, I searched...

Specific steps to use vant framework in WeChat applet

Table of contents 1. Open the project directory o...

A brief analysis of JS original value and reference value issues

Primitive values ​​-> primitive types Number S...

Analyze the usage and principles of Vue's provide and inject

First, let's talk about why we use provide/in...

How to use macros in JavaScript

In languages, macros are often used to implement ...