Let's talk about the LIMIT statement in MySQL in detail

Let's talk about the LIMIT statement in MySQL in detail

Recently, many friends asked children a question about LIMIT in the Q&A group. Let me briefly describe this question below.

question

In order for the story to develop smoothly, we must first have a table:

CREATE TABLE t (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    key1 VARCHAR(100),
    common_field VARCHAR(100),
    PRIMARY KEY (id),
    KEY idx_key1 (key1)
) Engine=InnoDB CHARSET=utf8;

Table t contains 3 columns, the id column is the primary key, and the key1 column is the secondary index column. The table contains 10,000 records.

When we execute the following statement, the secondary index idx_key1 is used:

mysql> EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 1;
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+------+------+
| 1 | SIMPLE | t | NULL | index | NULL | idx_key1 | 303 | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+------+------+
1 row in set, 1 warning (0.00 sec)

This is easy to understand because in the secondary index idx_key1, the key1 column is ordered. If the query is to obtain the first record sorted by the key1 column, MySQL only needs to obtain the first secondary index record from idx_key1, and then directly return to the table to obtain the complete record.

However, if we change LIMIT 1 in the above statement to LIMIT 5000, 1, we need to scan the entire table and perform filesort. The execution plan is as follows:

mysql> EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 5000, 1;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 9966 | 100.00 | Using filesort |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

Some students don't understand: LIMIT 5000, 1 can also use the secondary index idx_key1. We can first scan the 5001th secondary index record and then perform a table return operation on the 5001th secondary index record. This cost is definitely better than full table scan + filesort.

I regret to tell you that due to defects in MySQL implementation, the above ideal situation will not occur. It will only stupidly perform full table scan + filesort. Let's talk about what is going on.

Server layer and storage engine layer

As we all know, MySQL is actually divided into the server layer and the storage engine layer:

  • The server layer is responsible for handling some common things, such as connection management, SQL syntax parsing, and analyzing execution plans.
  • The storage engine layer is responsible for specific data storage, such as whether the data is stored in files or memory, and what the specific storage format is. We now basically use the InnoDB storage engine, and other storage engines are rarely used, so we will not involve other storage engines.

The execution of a SQL statement in MySQL requires multiple interactions between the server layer and the storage engine layer to get the final result. For example, consider the following query:

SELECT * FROM t WHERE key1 > 'a' AND key1 < 'b' AND common_field != 'a';

The server layer will analyze that the above statement can be executed using the following two solutions:

  • Solution 1: Use full table scan
  • Solution 2: Use the secondary index idx_key1. In this case, you need to scan all secondary index records whose key1 column values ​​are between ('a', 'b'), and each secondary index record needs to be back-listed.

The server layer will analyze which of the above two solutions has a lower cost, and then select the solution with lower cost as the execution plan. Then the interface provided by the storage engine is called to actually execute the query.

Here we assume that solution 2 is adopted, that is, to use the secondary index idx_key1 to execute the above query. Then the conversation between the server layer and the storage engine layer can be as follows:

Server layer: "Hey, please check the first record in the ('a', 'b') interval of the idx_key1 secondary index, and return the complete record to me after returning the table."

InnoDB responds: “Got it. I’ll check it right away.” Then, InnoDB quickly locates the first secondary index record in the scan interval ('a', 'b') through the B+ tree corresponding to the idx_key1 secondary index, and then returns the complete clustered index record to the server layer.

After receiving the complete clustered index record, the server layer continues to determine whether the common_field!='a' condition is met. If not, the record is discarded, otherwise the record is sent to the client. Then say to the storage engine: "Please give me the next record"

Tips:

Here, sending the record to the client is actually sending it to the local network buffer. The buffer size is controlled by net_buffer_length and the default size is 16KB. The network packet is actually sent to the client only when the buffer is full.

InnoDB: “Got it, I’ll check it right away.” InnoDB finds the next secondary index record in the ('a', 'b') interval of idx_key1 based on the next_record attribute of the record, then performs a table return operation and returns the obtained complete clustered index record to the server layer.

Tips:
Both clustered index records and secondary index records contain an attribute called next_record. Each record is connected into a linked list based on next_record, and the records in the linked list are sorted by key value (for clustered index, the key value refers to the value of the primary key, for secondary index records, the key value refers to the value of the secondary index column).

After receiving the complete clustered index record, the server layer continues to determine whether the common_field!='a' condition is met. If not, the record is discarded, otherwise the record is sent to the client. Then say to the storage engine: "Please give me the next record"

... and then repeat the above process over and over again.

until:

That is, until InnoDB finds that the next secondary index record obtained according to the next_record of the secondary index record is not in the interval ('a', 'b'), it tells the server layer: "Okay, there is no next record in the interval ('a', 'b')"

When the server layer receives the message from InnoDB that there is no next record, it ends the query.

Now everyone knows the basic interaction process between the server layer and the storage engine layer.

What the hell is LIMIT?

It may be a bit surprising to everyone to say that MySQL will only process the content in the LIMIT clause when the server layer is ready to send records to the client. Take the following statement as an example:

SELECT * FROM t ORDER BY key1 LIMIT 5000, 1;

If you execute the above query using idx_key1, MySQL will handle it like this:

  • The server layer asks InnoDB for the first record. InnoDB obtains the first secondary index record from idx_key1, performs a table return operation to obtain the complete clustered index record, and then returns it to the server layer. The server layer is ready to send it to the client, and it finds that there is a LIMIT 5000, 1 requirement, which means that only the 5001th record that meets the conditions can be actually sent to the client. So let's do a statistics here. We assume that the server layer maintains a variable called limit_count to count how many records have been skipped. At this time, limit_count should be set to 1.
  • The server layer then asks InnoDB for the next record. InnoDB then finds the next secondary index record based on the next_record attribute of the secondary index record, and then returns the complete clustered index record to the server layer. When the server layer sends it to the client, it finds that limit_count is only 1, so it gives up the operation of sending it to the client and increases limit_count by 1. At this time, limit_count becomes 2.
  • ... Repeat the above steps
  • When limit_count is equal to 5000, the server layer will actually send the complete clustered index record returned by InnoDB to the client.

From the above process, we can see that since MySQL will not determine whether the LIMIT clause meets the requirements until the record is actually sent to the client, if the secondary index is used to execute the above query, it means that 5001 table return operations are required. When analyzing the execution plan, the server layer will feel that the cost of executing so many table returns is too high, and it is not as fast as a direct full table scan + filesort, so it chooses the latter to execute the query.

what to do?

Due to the limitations of MySQL's implementation of the LIMIT clause, is it not possible to speed up queries by using secondary indexes when processing statements such as LIMIT 5000, 1? In fact, it is not. Just rewrite the above statement into:

SELECT * FROM t, (SELECT id FROM t ORDER BY key1 LIMIT 5000, 1) AS d
    WHERE t.id = d.id;

In this way, SELECT id FROM t ORDER BY key1 LIMIT 5000, 1 exists as a separate subquery. Because the query list of the subquery has only one id column, MySQL can execute the subquery by scanning only the secondary index idx_key1, and then search table t based on the primary key value obtained in the subquery.

This saves the need to return to the table for the first 5,000 records, greatly improving query efficiency!

Tucao

When will the guys who designed MySQL fix this super stupid implementation of the LIMIT clause? Users have to manually try to deceive the optimizer to improve query efficiency~

This is the end of this article about the LIMIT statement in MySQL. For more information about the LIMIT statement in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion on MySQL select optimization solution
  • MySQL select results to perform update example tutorial
  • Solve the problem that MySQL read-write separation causes data not to be selected after insert
  • How MySQL Select Statement is Executed
  • Detailed example of using the distinct method in MySQL
  • Should I use distinct or group by to remove duplicates in MySQL?
  • The difference between distinct and group by in MySQL
  • MySQL series tutorial on understanding the use of union (all) and limit and exists keywords
  • The impact of limit on query performance in MySQL
  • Use of select, distinct, and limit in MySQL

<<:  A brief discussion on read-only and disabled attributes in forms

>>:  Example of implementing translation effect (transfrom: translate) with CSS3

Recommend

CSS3 new layout: flex detailed explanation

Flex Basic Concepts Flex layout (flex is the abbr...

Detailed explanation of MySQL 5.7.9 shutdown syntax example

mysql-5.7.9 finally provides shutdown syntax: Pre...

innodb_flush_method value method (example explanation)

Several typical values ​​of innodb_flush_method f...

Gradient slide effect implemented by CSS3

Achieve results Code html <div class="css...

JSONP cross-domain simulation Baidu search

Table of contents 1. What is JSONP 2. JSONP cross...

Docker connects to the host Mysql operation

Today, the company project needs to configure doc...

Detailed explanation of MySQL syntax, special symbols and regular expressions

Mysql commonly used display commands 1. Display t...

How to deploy Go web applications using Docker

Table of contents Why do we need Docker? Docker d...

What is TypeScript?

Table of contents 1. JavaScript issues 2. Advanta...

Docker Basic Tutorial: Detailed Explanation of Dockerfile Syntax

Preface Dockerfile is a script interpreted by the...

Detailed explanation of built-in methods of javascript array

Table of contents 1. Array.at() 2. Array.copyWith...

Detailed explanation of several methods of JS array dimensionality reduction

Dimensionality reduction of two-dimensional array...

Mysql command line mode access operation mysql database operation

Usage Environment In cmd mode, enter mysql --vers...