MySQL sorting feature details

MySQL sorting feature details

1. Problem scenario

A new transaction record export function has been launched, the logic is very simple: export the corresponding data based on the query conditions. Due to the large amount of data, pagination query was used when querying the database, with 1,000 pieces of data queried each time.

The self-test is normal, the test environment is normal, and the data exported by the operation feedback after going online has duplicate records .

I originally thought it was a business logic problem, so Review the code again, but still couldn't find the cause of the problem. Finally, I had to take out the SQL statement and execute it separately, export the data, and compare it. I found that it was caused by the disorder of SQL statement query results.

2. Cause Analysis

The query statement is sorted in descending order by create_time and paginated by limit . Normally, there will be no problem. However, when the business concurrency is large, resulting in a large number of identical create_time values, paging based on limit will cause disorder.

The scenario that occurs is: sorting by create_time , when create_time has the same value, paging through limit causes the paging data to be out of order.

For example, when querying 1000 pieces of data, among which there is a batch of create_time record values ​​of " 2021-10-28 12:12:12 ", when some of these data with the same creation time appear on the first page and some appear on the second page, when querying the data on the second page, the data that has been queried on the first page may appear.

In other words, the data will jump back and forth, appearing on the first page for a while and on the second page for a while, which will cause part of the exported data to be repeated and part to be missing.

I checked the official documentation of MySQL 5.7 and 8.0, and the description is as follows:

If multiple rows have identical values ​​in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

To summarize the above: When using ORDER BY to sort columns, if there are multiple rows with the same data in the corresponding ( ORDER BY column), the ( Mysql ) server will return these rows in an arbitrary order and may return them in different ways depending on the overall execution plan.

To put it simply: For data queried by ORDER BY , if there are multiple rows of identical data in ORDER BY column, Mysql will return them randomly. This will result in a disordered situation even though sorting is used.

3. Solution

The basic solution to the above problem is to avoid duplication of values ​​in ORDER BY column. Therefore, other dimensions can be added, such as other sorting columns such as ID.

select * from tb_order order by create_time ,id desc;

In this way, when create_time is the same, it will be sorted according to id, and id will definitely be different, so the above problem will no longer occur.

4. Expand your knowledge

In fact, the above content has been clearly explained on the official website of Mysql , and examples are also given. The following is a brief summary of the content and examples of the official website.

4.1 Limit query optimization

If we are only querying a part of a result set, we should not query all the data and then discard the unnecessary data, but restrict it through the limit condition.

When the having condition is not used, MySQL may optimize the limit condition:

  • If you only need to query a few records, it is recommended to use limit so that Mysql can use the index, whereas Mysql usually scans the entire table.
  • If you use limit row_count and order by together, MySQL will stop sorting as soon as it finds the first row_count result set, rather than sorting the entire result set. If you operate based on the index at this time, the speed will be faster. If a file sort is necessary, some or all of the qualifying results may be sorted before row_count result set is found. But once the row_count result is found, the rest will not be sorted. One manifestation of this feature is that the order of returned results may be different when querying with or without limit as mentioned earlier.
  • If you use limit row_count and distinct together, MySQL will stop immediately after finding the unique row in row_count result set.
  • In some cases, you can implement a group by by reading the index sequentially (or sorting the index) and then computing summaries until the index changes. In this case, limit row_count does not count any unnecessary group by values.
  • As soon as MySQL has sent the required number of rows to the client, it aborts the query unless SQL_CALC_FOUND_ROWS was used. In this case, you can use SELECT FOUND_ROWS() to retrieve the number of rows.
  • LIMIT 0 quickly returns an empty collection and is often used to check the validity of SQL. It can also be used to obtain the type of result set in the application. In the MySQL client, you can use --column-type-info to display the result column type.
  • If you use a temporary table to resolve the query, Mysql will use limit row_count to calculate how much space is needed.
  • If order by statement does not use an index and a limit condition exists, the optimizer may avoid using merge files and use a memory filesort operation to sort the rows in memory.

Now that we have learned about some features of limit , let's return to the focus of this article, the combined use of limit row_count and order by .

4.2 Use limit and order by together

As mentioned in the second point above, one of the characteristics of the combination of limit row_count and order by is that the order in which the results are returned is uncertain. One factor that affects the execution plan is limit . Therefore, the order of returned results may be different when the same query statement is executed with or limit limit

In the following example, the sort query is performed based on the category column, while the id and rating are uncertain:

mysql> SELECT * FROM ratings ORDER BY category;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
| 1 | 1 | 4.5 |
| 5 | 1 | 3.2 |
| 3 | 2 | 3.7 |
| 4 | 2 | 3.5 |
| 6 | 2 | 3.5 |
| 2 | 3 | 5.0 |
| 7 | 3 | 2.7 |
+----+----------+--------+

When a query statement contains limit, it may affect data with the same category value:

mysql> SELECT * FROM ratings ORDER BY category LIMIT 5;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
| 1 | 1 | 4.5 |
| 5 | 1 | 3.2 |
| 4 | 2 | 3.5 |
| 3 | 2 | 3.7 |
| 6 | 2 | 3.5 |
+----+----------+--------+

The result positions of id 3 and 4 have changed.

In practice, it is often very important to maintain the order of query results. In this case, other columns need to be introduced to ensure the order of results.

After the id is introduced in the above example, the query statement and results are as follows:

mysql> SELECT * FROM ratings ORDER BY category, id;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
| 1 | 1 | 4.5 |
| 5 | 1 | 3.2 |
| 3 | 2 | 3.7 |
| 4 | 2 | 3.5 |
| 6 | 2 | 3.5 |
| 2 | 3 | 5.0 |
| 7 | 3 | 2.7 |
+----+----------+--------+
​
mysql> SELECT * FROM ratings ORDER BY category, id LIMIT 5;
+----+----------+--------+
| id | category | rating |
+----+----------+--------+
| 1 | 1 | 4.5 |
| 5 | 1 | 3.2 |
| 3 | 2 | 3.7 |
| 4 | 2 | 3.5 |
| 6 | 2 | 3.5 |
+----+----------+--------+

It can be seen that when the sorting of the id column is added, there will be no disorder problem even if category are the same. This is consistent with our original solution.

5. Summary

Originally, through an occasional pitfall in practice, we talked about Mysql optimization of limit query statements and provided a solution that met business needs and avoided business logic errors.

Many friends are using order by and limit statements for queries, but if you don’t know these optimization features of Mysql , you may have fallen into the trap, but the amount of data has not triggered the presentation.

This is the end of this article about the details of MySQL sorting features. For more relevant MySQL sorting features, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Example of utf8mb4 collation in MySQL
  • MySQL aggregate function sorting
  • MySQL sorting using index scan
  • Some lesser-known sorting methods in MySQL
  • Mysql Chinese sorting rules description
  • Pitfalls based on MySQL default sorting rules
  • MySQL sorting principles and case analysis
  • MySQL query sorting and paging related
  • How to use indexes to optimize MySQL ORDER BY statements
  • Mysql sorting and paging (order by & limit) and existing pitfalls

<<:  CSS3 to achieve timeline effects

>>:  Detailed explanation of the frame and rules attributes of the table in HTML

Recommend

A brief understanding of MySQL SELECT execution order

The complete syntax of the SELECT statement is: (...

Detailed explanation of how to adjust Linux command history

The bash history command in Linux system helps to...

Ubuntu 16.04 mysql5.7.17 open remote port 3306

Enable remote access to MySQL By default, MySQL u...

Summary of the use of CSS scope (style splitting)

1. Use of CSS scope (style division) In Vue, make...

How to upgrade MySQL 5.6 to 5.7 under Windows

Written in front There are two ways to upgrade My...

Two-hour introductory Docker tutorial

Table of contents 1.0 Introduction 2.0 Docker Ins...

Why web page encoding uses utf-8 instead of gbk or gb2312?

If you have a choice, you should use UTF-8 In fac...

A brief discussion on when MySQL uses internal temporary tables

union execution For ease of analysis, use the fol...

Skin change solution based on Vue combined with ElementUI

Table of contents Written in front Solution 1: Us...

Vue realizes adding watermark to uploaded pictures (upgraded version)

The vue project implements an upgraded version of...

Detailed installation and configuration tutorial of PostgreSQL 11 under CentOS7

1. Official website address The official website ...

Three ways to delete a table in MySQL (summary)

drop table Drop directly deletes table informatio...

Example of deploying Laravel application with Docker

The PHP base image used in this article is: php:7...

How to use VirtualBox to simulate a Linux cluster

1. Set up HOST on the host Macbook The previous d...

How to configure Linux to use LDAP user authentication

I am using LDAP user management implemented in Ce...