MySQL database index order by sorting detailed explanation

MySQL database index order by sorting detailed explanation

When I think of the word "sort", my first impression is that almost all apps have a sorting place. Taobao products are sorted by purchase time, and Bilibili comments are sorted by popularity...

When it comes to sorting in MySQL, what is the first thing that comes to your mind? Keyword order by? Is it best to have an index on the order by field? Are the leaf nodes already in order? Or should we avoid sorting inside MySQL as much as possible?

The cause of the incident

Now suppose there is a user's friend table:

CREATE TABLE `user` (
  `id` int(10) AUTO_INCREMENT,
  `user_id` int(10),
  `friend_addr` varchar(1000),
  `friend_name` varchar(100),  
  PRIMARY KEY (`id`),
  KEY `user_id` (`user_id`)
)ENGINE=InnoDB;

There are currently two points in the table that need attention:

  • The user's user_id, the friend's name friend_name, the friend's address friend_addr
  • user_id is indexed

One day, a junior development engineer named Xiaoyuan received a request from a junior product manager named Xiaowang:
Xiao Wang: Comrade Xiaoyuan, we need to add a function in the background. This function should support the query of all the friends' names and addresses according to the user ID, and require that the friends' names are sorted according to the dictionary.
Xiaoyuan: Okay, this function is simple, I will go online right away.

So Xiaoyuan wrote the following sql:

select friend_name,friend_addr from user where user_id=? order by name

In a flash, Xiaoyuan went online with great fanfare. Everything was going well until one day an operations classmate asked the following question:

select friend_name,friend_addr from user where user_id=10086 order by name

However, this query was much slower than usual, and the database reported a slow query. Xiaoyuan was panicking: What's going on? There is an index on user_id, and cleverly I only used select friend_name, friend_addr instead of select *. At this time, Xiaoyuan kept comforting himself, telling himself to stay calm, and then suddenly he remembered that there was an explain command. He decided to use explain to check the execution plan of that SQL. After Xiaoyuan used explain, he found a dangerous-looking word in the extra field: using filesort.

"This query actually uses the legendary file sort, but if a person does not have many friends, it should be fast even if file sort is used", unless user_id=10086 has many friends. Later, Xiaoyuan checked and found that this user actually has more than 100,000 friends~.

The little ape was lost in thought and thought: It seems that I have to take the blame for this. 100,000 data points are a bit too much. And what is the sorting principle of using filesort?

Anatomy file sorting

Someone may say that the problem above is that 10w data is too large, and it will be slow even if it is not sorted. This actually makes sense. If 10w data is checked at one time, both the MySQL memory buffer and the network bandwidth will be consumed very much. What if I add a limit of 1000? The problem of network bandwidth has definitely been solved because the overall data packet size has become smaller, but the problem of using filesort has not been solved. Seeing this, you may have questions, does using filesort sort the files? How are they sorted in the file? Or let me ask this: How would you handle it if you were asked to design a sort? With these questions and thoughts, let's take a look at the technical difficulties involved in using filesort and how to solve them?

  1. First, our user_id is indexed, so we will first search for our target data on the user_id index tree, that is, the data of user_id=10086. However, we want to query the friend_name and friend_addr fields. Unfortunately, the user_id index alone cannot find the values ​​of these two fields.
  2. So we need to go back to the table and search the primary key index tree through the primary key corresponding to user_id. OK, we found the friend_name and friend_addr fields of the first user_id=10086.
  3. What should I do now? It is definitely not right to return directly, because I need to sort friend_name. How to sort it? The data has not been found yet, so you have to put the found data in one place first, which is sort_buffer. I think you should have guessed it by the name. Yes, sort_buffer is the buffer used for sorting in this case. It should be noted here that each thread will have a separate sort_buffer. The purpose of doing this is mainly to avoid lock contention caused by multiple threads operating on the same block of memory.
  4. When the friend_name and friend_addr of the first data have been put into the sort_buffer, it is of course not over yet, and the synchronization steps will be repeated until all the friend_name and friend_addr of user_id=10086 are put into the sort_buffer.
  5. The data in sort_buffer has been put into the data, and it is time to sort it. Here, MySQL will perform a quick sort on friend_name. After the quick sort, friend_name in sort_buffer is in order.
  6. Finally, the first 1000 items in sort_buffer are returned and the process ends.

Everything looks smooth, but sort_buffer takes up memory space, which is awkward. Memory itself is not infinite, it definitely has an upper limit. Of course, sort_buffer cannot be too small. If it is too small, it will not make much sense. In the InnoDB storage engine, this value defaults to 256K.

mysql> show variables like 'sort_buffer_size';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| sort_buffer_size | 262144 |
+------------------+--------+

That is to say, if the data to be put into the sort_buffer is larger than 256K, then the quick sort method in the sort_buffer will definitely not work. At this time, you may ask: Can't MySQL automatically expand according to the data size? Well, MySQL is a multi-threaded model. If each thread is expanded, the buffer allocated to other functions will be smaller (such as change buffer, etc.), which will affect the quality of other functions.

At this time, we have to change the way to sort. Yes, this is the real file sorting, that is, the temporary file on the disk. MySQL will use the idea of ​​merge sorting to divide the data to be sorted into several parts. After each piece of data is sorted in memory, it will be put into a temporary file. Finally, the data of these sorted temporary files will be merged and sorted again. This is a typical divide and conquer principle. Its specific steps are as follows:

  1. First, split the data to be sorted into pieces that can be put into the sort_buffer.
  2. Sort each piece of data in the sort_buffer and write it to a temporary file after sorting.
  3. When all the data is written to the temporary file, each temporary file is in order, but they are not a whole, and the whole is not in order, so the data must be merged next.
  4. Assume that there are two temporary files tmpX and tmpY. At this time, part of the data will be read from tmpX into the memory, and then part of the data will be read from tmpY into the memory. You may be curious why it is part instead of the whole or a single file? First of all, disks are slow, so try to read as much data into memory each time, but don't read too much because there is a buffer space limit.
  5. For tmpX, assume that what is read in is tmpX[0-5], and for tmpY, assume that what is read in is tmpY[0-5]. Then we only need to compare like this: if tmpX[0] < tmpY[0], then tmpX[0] must be the smallest. Then compare tmpX[1] and tmpY[0]. If tmpX[1] > tmpY[0], then tmpY[0] must be the second smallest. By comparing them one by one, we can finally merge tmpX and tmpY into an ordered file tmpZ. Multiple such tmpZ files can be merged again. Finally, all the data can be merged into an ordered large file.

File sorting is very slow, is there any other solution?

Through the above sorting process, we know that if the data to be sorted is very large and exceeds the size of sort_buffer, then file sorting is required. File sorting involves batch sorting and merging, which is very time-consuming. The root cause of this problem is that sort_buffer is not enough. I don’t know if you have noticed that our friend_name needs to be sorted, but friend_addr is also stuffed into sort_buffer. In this way, the size of a single line of data is equal to the length of friend_name + the length of friend_addr. Can we store only the friend_name field in sort_buffer? In this way, the overall utilization space will be large, and temporary files may not be needed. That’s right, this is another sorting optimization I’m going to talk about next: rowid sorting.

The idea of ​​rowid sorting is to keep unnecessary data out of the sort_buffer and keep only necessary data in the sort_buffer. So what do you think is necessary data? Just put friend_name? This definitely won’t work. After the sorting is complete, what happens to friend_addr? Therefore, we also need to put the primary key id in. After sorting, we can go back to the secondary table through the id and get the friend_addr. Therefore, the general process is as follows:

  1. According to the user_id index, find the target data, then return to the table and put only the id and friend_name into the sort_buffer
  2. Repeat step 1 until all target data is in sort_buffer
  3. Sort the data in sort_buffer by the friend_name field
  4. After sorting, the table is searched again according to the id to find friend_addr and the process ends when 1,000 records are returned.

There are actually a few points to note here:

  • This method requires two returns to the table.
  • Although sort_buffer is small, if the amount of data is still large, temporary files should still be sorted.

So the question is, how should MySQL choose between the two methods? The decision of which method to use depends on a certain condition. The condition is the length of a single row in sort_buffer. If the length is too large (the length of friend_name + friend_addr), rowid will be used. Otherwise, the first method uses the length standard based on max_length_for_sort_data, which defaults to 1024 bytes:

mysql> show variables like 'max_length_for_sort_data';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| max_length_for_sort_data | 1024 |
+--------------------------+-------+

Don't want to go back to the table and sort it again

In fact, no matter which of the above methods is used, they all need to return to the table + sort. Returning to the table is because there is no target field on the secondary index, and sorting is because the data is not ordered. If there is a target field on the secondary index and it is already sorted, then wouldn’t it be the best of both worlds?

That's right, it's a joint index. We only need to create a joint index of (user_id, friend_name, friend_addr). In this way, I can get the target data through this index, and the friend_name field is already sorted. There is also a friend_addr field. It's done in one go, without returning to the table or sorting again. Therefore, for the above SQL, its general process is as follows:

  • Find the data of user_id=10086 through the joint index, then read the corresponding friend_name and friend_addr fields and return them directly, because friend_name is already sorted and no additional processing is required
  • Repeat the first step, and continue searching backwards along the leaf node until the first data that is not 10086 is found.

Although joint indexes can solve this problem, they should not be established blindly in actual applications. You should determine whether they need to be established based on the actual business logic. If similar queries are not frequent, you do not need to establish them because joint indexes will take up more storage space and maintenance costs.

Summarize

  1. When the order by statement does not use an index, the words "using filesort" will appear in the Extra field in the explain statement.
  2. Don't panic when using filesort appears. If the data volume is not large, such as only a few dozen pieces of data, then using quick sort in the sort buffer is also very fast.
  3. If the amount of data is large and exceeds the size of the sort buffer, a temporary file sort is required, which is merge sort. This is determined by the MySQL optimizer.
  4. If there are many fields in the query and you want to avoid using temporary files for sorting, you can try to set the size of the max_length_for_sort_data field to be smaller than the sum of the lengths of all query fields. This may avoid the problem, but it will result in one more table return operation.
  5. In actual business, we can also create a joint index for the combination of fields that are frequently queried, so that there is no need to return to the table or sort separately, but the joint index will take up more storage and overhead
  6. When querying a large amount of data, it is a good idea to query in batches and explain in advance to observe the SQL execution plan.

The above is the detailed content of the MySQL database order by sorting. For more information about MySQL database order by sorting, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Database query sorting using random sorting results example (Oracle/MySQL/MS SQL Server)
  • MySQL query statement uses limit to limit the number of rows queried
  • Two methods to sort Chinese data in MySQL by pinyin
  • Basic tutorial on sorting data using indexes in MySQL
  • MYSQL Must Know Reading Notes Chapter 5 Sorting and Retrieving Data
  • Yii2 implements cross-MySQL database association query sorting function code
  • Implementation of MySQL asc and desc data sorting
  • Introduction to MySQL limit query and data sorting

<<:  Docker container time zone error issue

>>:  Detailed explanation of six web page image carousel effects implemented with JavaScript

Recommend

js to realize a simple puzzle game

This article shares the specific code of js to im...

Introduction to the common API usage of Vue3

Table of contents Changes in the life cycle react...

MySQL 5.7.18 download and installation process detailed instructions

MySql Download 1. Open the official website and f...

Nginx location matching rule example

1. Grammar location [=|~|~*|^~|@] /uri/ { ... } 2...

Calculation of percentage value when the css position property is absolute

When position is absolute, the percentage of its ...

Docker Swarm from deployment to basic operations

About Docker Swarm Docker Swarm consists of two p...

Solve the problem after adding --subnet to Docker network Create

After adding –subnet to Docker network Create, us...

Example of using rem to replace px in vue project

Table of contents tool Install the plugin Add a ....

JavaScript singleton mode to implement custom pop-up box

This article shares the specific code of JavaScri...

Vue resets data to its initial state

In some cases, the data in data needs to be reuse...

How to quickly build a static website on Alibaba Cloud

Preface: As a junior programmer, I dream of build...

Detailed explanation of Vuex overall case

Table of contents 1. Introduction 2. Advantages 3...

MySQL uses custom functions to recursively query parent ID or child ID

background: In MySQL, if there is a limited level...