PrefaceSorting is a basic function in databases, and MySQL is no exception. Users can achieve the purpose of sorting the specified result set through the Order by statement. In fact, not only the Order by statement, but also the Group by statement and Distinct statement will implicitly use sorting. This article will first briefly introduce how SQL uses indexes to avoid sorting costs, then introduce the internal principles of MySQL sorting and parameters related to sorting. Finally, it will give several "strange" sorting examples to discuss the sorting consistency problem and explain the essential reasons for the phenomenon. 1. Sorting optimization and index usageIn order to optimize the sorting performance of SQL statements, the best situation is to avoid sorting. Proper use of indexes is a good method. Because the index itself is also ordered, if a suitable index is created on the field that needs to be sorted, the sorting process can be skipped, thereby improving the SQL query speed. Below I will use some typical SQL statements to illustrate which SQL statements can use indexes to reduce sorting and which SQL statements cannot. Assume that table t1 has indexes key1(key_part1,key_part2),key2(key2) a. SQL that can use indexes to avoid sorting SELECT * FROM t1 ORDER BY key_part1,key_part2; SELECT * FROM t1 WHERE key_part1 = constant ORDER BY key_part2; SELECT * FROM t1 WHERE key_part1 > constant ORDER BY key_part1 ASC; SELECT * FROM t1 WHERE key_part1 = constant1 AND key_part2 > constant2 ORDER BY key_part2; b. SQL that cannot use indexes to avoid sorting //The sorting field is in multiple indexes, so index sorting cannot be used SELECT * FROM t1 ORDER BY key_part1, key_part2, key2; //The sort key order is inconsistent with the column order in the index, so the index cannot be used for sorting. SELECT * FROM t1 ORDER BY key_part2, key_part1; //The ascending and descending order is inconsistent, and the index sorting cannot be used SELECT * FROM t1 ORDER BY key_part1 DESC, key_part2 ASC; //key_part1 is a range query, key_part2 cannot be sorted using the index SELECT * FROM t1 WHERE key_part1> constant ORDER BY key_part2; 2. Sorting algorithmFor SQL that cannot use indexes to avoid sorting, the database has to implement the sorting function itself to meet user needs. At this time, "Using filesort" will appear in the SQL execution plan. It should be noted here that filesort does not mean file sorting. In fact, it may also be memory sorting, which is mainly determined by the sort_buffer_size parameter and the result set size. There are three main ways to implement sorting in MySQL: conventional sorting, optimized sorting, and priority queue sorting, which mainly involve three sorting algorithms: quick sort, merge sort, and heap sort. Assume that the table structure and SQL statement are as follows: CREATE TABLE t1(id int, col1 varchar(64), col2 varchar(64), col3 varchar(64), PRIMARY KEY(id),key(col1,col2)); SELECT col1,col2,col3 FROM t1 WHERE col1>100 ORDER BY col2; a. Conventional sorting (1) Get the records that meet the WHERE condition from table t1 (2) For each record, take out the record's primary key + sort key (id, col2) and put it into the sort buffer (3) If the sort buffer can store all the (id, col2) pairs that meet the conditions, they are sorted; otherwise, when the sort buffer is full, they are sorted and solidified into a temporary file. (The sorting algorithm used is the quick sort algorithm) (4) If a temporary file is generated during sorting, a merge sort algorithm needs to be used to ensure that the records in the temporary file are in order. (5) Loop through the above process until all records that meet the criteria are sorted. (6) Scan the sorted (id, col2) pairs and use the id to retrieve the columns (col1, col2, col3) that SELECT needs to return. (7) Return the obtained result set to the user. From the above process, whether to use file sorting depends mainly on whether the sort buffer can accommodate the (id, col2) pairs that need to be sorted. The size of this buffer is controlled by the sort_buffer_size parameter. In addition, one sort requires two IOs, one is to retrieve (id, col2), and the second is to retrieve (col1, col2, col3). Since the returned result set is sorted by col2, the ids are in disorder. When retrieving (col1, col2, col3) through disordered ids, a large amount of random IO will be generated. For the second time, MySQL itself has an optimization, that is, before retrieving, the IDs are first sorted and put into a buffer. The size of this buffer is controlled by the parameter read_rnd_buffer_size, and then the records are retrieved in order, converting random IO to sequential IO. b. Optimize sorting In addition to the sorting itself, the conventional sorting method requires two additional IO operations. Compared with conventional sorting, the optimized sorting method reduces the second IO. The main difference is that what is put into the sort buffer is not (id,col2), but (col1,col2,col3). Since the sort buffer contains all the fields required for the query, it can be returned directly after the sort is completed without having to retrieve the data again. The cost of this approach is that the sort buffer of the same size can store fewer (col1, col2, col3) than (id, col2). If the sort buffer is not large enough, it may be necessary to write temporary files, resulting in additional IO. Of course, MySQL provides the parameter max_length_for_sort_data. Only when the sorting tuple is less than max_length_for_sort_data can the optimized sorting method be used, otherwise only the conventional sorting method can be used. c. Priority queue sorting In order to get the final sorting result, no matter what, we need to sort all the records that meet the conditions before returning them. So compared to optimizing the sorting method, is there still room for optimization? Version 5.6 optimizes the Order by limit M, N statement at the space level and adds a new sorting method - priority queue, which is implemented using heap sort. The characteristics of the heap sort algorithm can solve the sorting problem of limit M, N. Although all elements are still required to participate in the sorting, only the sort buffer space of M+N tuples is required. For scenarios with very small M and N, there is basically no problem of needing temporary files for merge sorting due to insufficient sort buffer. For ascending order, a max-heap is used, and the elements in the final heap constitute the smallest N elements. For descending order, a min-heap is used, and the elements in the final heap constitute the largest N elements. 3. Inconsistent sorting problemCase 1 After migrating Mysql from 5.5 to 5.6, duplicate values were found in the paging. Test table and data: create table t1(id int primary key, c1 int, c2 varchar(128)); insert into t1 values(1,1,'a'); insert into t1 values(2,2,'b'); insert into t1 values(3,2,'c'); insert into t1 values(4,2,'d'); insert into t1 values(5,3,'e'); insert into t1 values(6,4,'f'); insert into t1 values(7,5,'g'); Assuming there are 3 records per page, the first page limit is 0,3 and the second page limit is 3,3. The query results are as follows: We can see that the record with id 4 appears in both queries, which is obviously not in line with expectations, and there is no such problem in version 5.5. The reason for this phenomenon is that 5.6 uses a priority queue for limit M, N statements, and the priority queue is implemented using a heap. For example, in the above example, order by c1 asc limit 0, 3 requires a max-heap of size 3; limit 3, 3 requires a max-heap of size 6. Since there are 3 records with c1 being 2, and heap sort is unstable (for the same key value, there is no guarantee that the position after sorting will be consistent with that before sorting), paging duplication occurs. To avoid this problem, we can add a unique value to the sort, such as the primary key id. In this way, since the id is unique, we can ensure that the key values involved in the sort are different. Write the SQL as follows: select * from t1 order by c1,id asc limit 0,3; select * from t1 order by c1,id asc limit 3,3; Case 2 Two similar query statements are identical except for the returned columns, but the sorting results are inconsistent. Test table and data: create table t2(id int primary key, status int, c1 varchar(255),c2 varchar(255),c3 varchar(255),key(c1)); insert into t2 values(7,1,'a',repeat('a',255),repeat('a',255)); insert into t2 values(6,2,'b',repeat('a',255),repeat('a',255)); insert into t2 values(5,2,'c',repeat('a',255),repeat('a',255)); insert into t2 values(4,2,'a',repeat('a',255),repeat('a',255)); insert into t2 values(3,3,'b',repeat('a',255),repeat('a',255)); insert into t2 values(2,4,'c',repeat('a',255),repeat('a',255)); insert into t2 values(1,5,'a',repeat('a',255),repeat('a',255)); Execute the SQL statements separately: select id,status,c1,c2 from t2 force index(c1) where c1>='b' order by status; select id,status from t2 force index(c1) where c1>='b' order by status; The execution results are as follows: See if the execution plans of the two are the same To illustrate the problem, I added the force index hint in the statement to ensure that the c1 column index can be used. The statement retrieves the id through the c1 column index, and then retrieves the returned columns from the table. According to the size of the c1 column value, the relative position of the record in the c1 index is as follows: (c1,id)===(b,6),(b,3),(5,c),(c,2), the corresponding status values are 2 3 2 4 respectively. If we retrieve data from the table and sort them by status, the relative positions become (6,2,b),(5,2,c),(3,3,c),(2,4,c). This is the result returned by the second query statement. So why are the first query statements (6,2,b),(5,2,c) in reverse order? Here you can see the red parts in a. Conventional sorting and b. Optimized sorting that I mentioned earlier, and you will understand the reason. Because the number of bytes of the column returned by the first query exceeds max_length_for_sort_data, conventional sorting is used. In this case, MYSQL sorts the rowid and converts random IO to sequential IO, so 5 is returned first and 6 is returned last. The second query uses optimized sorting, without the second data retrieval process, and the relative position of the sorted records is maintained. For the first statement, if we want to use optimized sorting, we can increase the max_length_for_sort_data setting, such as 2048. 4. Reference Documents
This is the end of this article about MySQL sorting principles and case analysis. For more relevant MySQL sorting principles and case content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Docker image import, export, backup and migration operations
>>: Six weird and useful things about JavaScript
Table of contents 1. Mysql data structure 2. The ...
This article shares with you the graphic tutorial...
A new window opens. Advantages: When the user cli...
Preface Starting from React 16, the concept of Er...
This article shares the specific code of vue unia...
When Mysql associates two tables, an error messag...
Add rules to the el-form form: Define rules in da...
Preface Note: The test database version is MySQL ...
What you will create In this new tutorial, we'...
Table of contents Since Vuex uses a single state ...
Implementing responsive layout with CSS Responsiv...
This article shares the specific code for the WeC...
Flex layout is also called elastic layout. Any co...
1. Hot deployment: It means redeploying the entir...
Utilize the browser's non- overflow:auto elem...