When I think of the word "sort", my first impression is that almost all apps have a sorting place. Taobao products are sorted by purchase time, and Bilibili comments are sorted by popularity... When it comes to sorting in MySQL, what is the first thing that comes to your mind? Keyword order by? Is it best to have an index on the order by field? Are the leaf nodes already in order? Or should we avoid sorting inside MySQL as much as possible? The cause of the incidentNow suppose there is a user's friend table: CREATE TABLE `user` ( `id` int(10) AUTO_INCREMENT, `user_id` int(10), `friend_addr` varchar(1000), `friend_name` varchar(100), PRIMARY KEY (`id`), KEY `user_id` (`user_id`) )ENGINE=InnoDB; There are currently two points in the table that need attention:
One day, a junior development engineer named Xiaoyuan received a request from a junior product manager named Xiaowang: So Xiaoyuan wrote the following sql: select friend_name,friend_addr from user where user_id=? order by name In a flash, Xiaoyuan went online with great fanfare. Everything was going well until one day an operations classmate asked the following question: select friend_name,friend_addr from user where user_id=10086 order by name However, this query was much slower than usual, and the database reported a slow query. Xiaoyuan was panicking: What's going on? There is an index on user_id, and cleverly I only used select friend_name, friend_addr instead of select *. At this time, Xiaoyuan kept comforting himself, telling himself to stay calm, and then suddenly he remembered that there was an explain command. He decided to use explain to check the execution plan of that SQL. After Xiaoyuan used explain, he found a dangerous-looking word in the extra field: using filesort. "This query actually uses the legendary file sort, but if a person does not have many friends, it should be fast even if file sort is used", unless user_id=10086 has many friends. Later, Xiaoyuan checked and found that this user actually has more than 100,000 friends~. The little ape was lost in thought and thought: It seems that I have to take the blame for this. 100,000 data points are a bit too much. And what is the sorting principle of using filesort? Anatomy file sortingSomeone may say that the problem above is that 10w data is too large, and it will be slow even if it is not sorted. This actually makes sense. If 10w data is checked at one time, both the MySQL memory buffer and the network bandwidth will be consumed very much. What if I add a limit of 1000? The problem of network bandwidth has definitely been solved because the overall data packet size has become smaller, but the problem of using filesort has not been solved. Seeing this, you may have questions, does using filesort sort the files? How are they sorted in the file? Or let me ask this: How would you handle it if you were asked to design a sort? With these questions and thoughts, let's take a look at the technical difficulties involved in using filesort and how to solve them?
Everything looks smooth, but sort_buffer takes up memory space, which is awkward. Memory itself is not infinite, it definitely has an upper limit. Of course, sort_buffer cannot be too small. If it is too small, it will not make much sense. In the InnoDB storage engine, this value defaults to 256K. mysql> show variables like 'sort_buffer_size'; +------------------+--------+ | Variable_name | Value | +------------------+--------+ | sort_buffer_size | 262144 | +------------------+--------+ That is to say, if the data to be put into the sort_buffer is larger than 256K, then the quick sort method in the sort_buffer will definitely not work. At this time, you may ask: Can't MySQL automatically expand according to the data size? Well, MySQL is a multi-threaded model. If each thread is expanded, the buffer allocated to other functions will be smaller (such as change buffer, etc.), which will affect the quality of other functions. At this time, we have to change the way to sort. Yes, this is the real file sorting, that is, the temporary file on the disk. MySQL will use the idea of merge sorting to divide the data to be sorted into several parts. After each piece of data is sorted in memory, it will be put into a temporary file. Finally, the data of these sorted temporary files will be merged and sorted again. This is a typical divide and conquer principle. Its specific steps are as follows:
File sorting is very slow, is there any other solution?Through the above sorting process, we know that if the data to be sorted is very large and exceeds the size of sort_buffer, then file sorting is required. File sorting involves batch sorting and merging, which is very time-consuming. The root cause of this problem is that sort_buffer is not enough. I don’t know if you have noticed that our friend_name needs to be sorted, but friend_addr is also stuffed into sort_buffer. In this way, the size of a single line of data is equal to the length of friend_name + the length of friend_addr. Can we store only the friend_name field in sort_buffer? In this way, the overall utilization space will be large, and temporary files may not be needed. That’s right, this is another sorting optimization I’m going to talk about next: rowid sorting. The idea of rowid sorting is to keep unnecessary data out of the sort_buffer and keep only necessary data in the sort_buffer. So what do you think is necessary data? Just put friend_name? This definitely won’t work. After the sorting is complete, what happens to friend_addr? Therefore, we also need to put the primary key id in. After sorting, we can go back to the secondary table through the id and get the friend_addr. Therefore, the general process is as follows:
There are actually a few points to note here:
So the question is, how should MySQL choose between the two methods? The decision of which method to use depends on a certain condition. The condition is the length of a single row in sort_buffer. If the length is too large (the length of friend_name + friend_addr), rowid will be used. Otherwise, the first method uses the length standard based on max_length_for_sort_data, which defaults to 1024 bytes: mysql> show variables like 'max_length_for_sort_data'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | max_length_for_sort_data | 1024 | +--------------------------+-------+ Don't want to go back to the table and sort it againIn fact, no matter which of the above methods is used, they all need to return to the table + sort. Returning to the table is because there is no target field on the secondary index, and sorting is because the data is not ordered. If there is a target field on the secondary index and it is already sorted, then wouldn’t it be the best of both worlds? That's right, it's a joint index. We only need to create a joint index of (user_id, friend_name, friend_addr). In this way, I can get the target data through this index, and the friend_name field is already sorted. There is also a friend_addr field. It's done in one go, without returning to the table or sorting again. Therefore, for the above SQL, its general process is as follows:
Although joint indexes can solve this problem, they should not be established blindly in actual applications. You should determine whether they need to be established based on the actual business logic. If similar queries are not frequent, you do not need to establish them because joint indexes will take up more storage space and maintenance costs. Summarize
The above is the detailed content of the MySQL database order by sorting. For more information about MySQL database order by sorting, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Docker container time zone error issue
>>: Detailed explanation of six web page image carousel effects implemented with JavaScript
This article shares the specific code of js to im...
Table of contents Changes in the life cycle react...
MySql Download 1. Open the official website and f...
1. Grammar location [=|~|~*|^~|@] /uri/ { ... } 2...
When position is absolute, the percentage of its ...
This article briefly introduces the relationship ...
About Docker Swarm Docker Swarm consists of two p...
After adding –subnet to Docker network Create, us...
Table of contents tool Install the plugin Add a ....
<button> tag <br />Definition and usag...
This article shares the specific code of JavaScri...
In some cases, the data in data needs to be reuse...
Preface: As a junior programmer, I dream of build...
Table of contents 1. Introduction 2. Advantages 3...
background: In MySQL, if there is a limited level...