Solution to data duplication when using limit+order by in MySql paging

Solution to data duplication when using limit+order by in MySql paging

summary

It is important to be able to explain complex knowledge in a simple way

During the learning process, we have read a lot of materials, videos, documents, etc. Because there are so many materials and videos now, there are often a variety of video forms to explain a knowledge point. In addition to promotional marketing, there are indeed many people whose video explanations are excellent. For example, Teacher Li Yongle’s short video lessons can explain such complex knowledge so easily and thoroughly on a blackboard. As for those of us who are learning programming, we not only need to learn how to explain the knowledge points clearly, but also how to write them clearly.

Problem Description

In MySQL, we usually use limit to perform page query. For example, limit(0,10) means listing 10 data on the first page, and limit(10,10) means listing the second page. However, when limit meets order by, records from the first page may appear again when you turn to the second page.

The details are as follows:

SELECT
  `post_title`,
  `post_date`
FROM
  post
WHERE
  `post_status` = 'publish'
ORDER BY
  view_count desc
LIMIT
  5, 5

When using the above SQL query, there is a high probability that a record identical to LIMIT 0,5 will appear. If you use the following method, there will be no duplication:

SELECT
  *
FROM
  post
WHERE
  post_status = 'publish'
ORDER BY
  view_count desc
LIMIT
  5, 5

However, since the post table has many fields, I only want to use these two fields and don't want to check post_content as well. To solve this problem, two sorting conditions are used after ORDER BY to solve this problem, as follows:

SELECT
  `post_title`,
  `post_date`
FROM
  post
WHERE
  `post_status` = 'publish'
ORDER BY
  view_count desc,
  ID asc
LIMIT
  5, 5

Logically speaking, MySQL sorting uses the primary key ID as the sorting condition by default. That is to say, if the view_count is equal, the primary key ID is used as the default sorting condition, and we do not need to add ID asc. But the fact is that when MySQL mixes order by and limit, sorting becomes chaotic.

Analyze the problem

In MySQL 5.6, the optimizer makes an optimization when encountering the order by limit statement, that is, using the priority queue.

The purpose of using priority queue is that when index order cannot be used, if you need to sort and use limit n, you only need to keep n records during the sorting process. Although this cannot solve the overhead of sorting all records, it only requires a small amount of memory in the sort buffer to complete the sorting.

The reason why MySQL 5.6 has the problem of duplicate data on the second page is that the priority queue uses the heap sorting method, which is an unstable sorting method. That is, the sorting results of the same values ​​may be inconsistent with the order of the read data.

MySQL 5.5 does not have this optimization, so this problem will not occur.

In other words, the problem mentioned in this article does not exist in MySQL 5.5, and this situation only occurred after version 5.6.

Let's look at the execution order when MySQL interprets the SQL language:

(1) SELECT
(2) DISTINCT <select_list>
(3) FROM <left_table>
(4) <join_type> JOIN <right_table>
(5) ON <join_condition>
(6) WHERE <where_condition>
(7) GROUP BY <group_by_list>
(8) HAVING <having_condition>
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>

The execution order is form… where… select… order by… limit…. Due to the priority queue mentioned above, after the select is completed, all records are arranged in heap sort. When performing order by, only the records with larger view_count values ​​are moved forward.

However, due to the limit factor, only 5 records need to be retained during the sorting process. view_count does not have index order, so when the second page of data is to be displayed, MySQL will take whichever record it sees. Therefore, when the sorting values ​​are the same, the first sorting is random. When the SQL is executed for the second time, the result should be the same as the first result.

Solving the problem

(1) Index sorting fields

If you add an index to the field, you can directly read and paginate according to the order of the index, thus avoiding this problem.

(2) Correctly understand paging

Pagination is based on sorting and is divided into quantity ranges. Sorting is a function provided by the database, while paging is a derived application requirement.

The official documentation of MySQL and Oracle provides methods of limit n and rownum < n, but does not clearly define the concept of paging.

Another important point is that although the above solution can alleviate this problem for users, according to users' understanding, there are still problems: for example, this table is frequently inserted, and when users query, the first page and the second page will still overlap under the read-committed isolation level.

Therefore, paging has always had this problem, and different scenarios do not have very high accuracy requirements for data paging.

(3) Some common database sorting problems

Sorting problem when no order by is added

When users use Oracle or MySQL, they find that MySQL is always in order, but Oracle is very chaotic. This is mainly because Oracle is a heap table and MySQL is an index clustered table. Therefore, when there is no order by, the database does not guarantee the order in which records are returned, and does not guarantee that each return is consistent.

Pagination problem Pagination duplication problem

As described above, paging is an application requirement derived from the sorting function provided by the database. The database does not guarantee the duplication of paging.

NULL value and empty string problem

Different databases understand and handle NULL values ​​and empty strings differently. For example, Oracle NULL and NULL values ​​cannot be compared. They are neither equal nor unequal, and are unknown. When inserting an empty string, MySQL treats it as an empty string with a length of 0, while Oracle directly processes the NULL value.

This concludes this article on how to solve the data duplication problem when using limit+order by in MySql paging. For more information about MySql limit+order by paging, please search 123WORDPRESS.COM's previous articles or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Solution to the problem that order by is not effective in MySQL subquery
  • Summary of three ways to implement ranking in MySQL without using order by
  • Detailed explanation of the pitfalls of mixing MySQL order by and limit
  • How to use MySQL group by and order by together
  • How to use indexes to optimize MySQL ORDER BY statements
  • Mysql sorting and paging (order by & limit) and existing pitfalls
  • Query process and optimization method of (JOIN/ORDER BY) statement in MySQL
  • MySQL briefly understands how "order by" works
  • Detailed explanation of Mysql's method of optimizing order by statement
  • Details on using order by in MySQL

<<:  Example code for using CSS cross-fade() to achieve a semi-transparent background image effect

>>:  How to deploy code-server using docker

Recommend

Using puppeteer to implement webpage screenshot function on linux (centos)

You may encounter the following problems when ins...

HTML Frameset Example Code

This article introduces a framework made by Frame...

Detailed analysis of MySQL instance crash cases

[Problem description] Our production environment ...

Tutorial on binary compilation and installation of MySql centos7 under Linux

// It took me a whole afternoon to install this, ...

nginx solves the problem of slow image display and incomplete download

Written in front Recently, a reader told me that ...

CenOS6.7 mysql 8.0.22 installation and configuration method graphic tutorial

CenOS6.7 installs MySQL8.0.22 (recommended collec...

MySQL 8.0.18 installation and configuration graphic tutorial

Learning objectives: Learn to use Windows system ...

Docker uses the Prune command to clean up the none image

Table of contents The creation and confusion of n...

Vue project code splitting solution

Table of contents background Purpose Before split...

MySQL-group-replication configuration steps (recommended)

MySQL-Group-Replication is a new feature develope...

HTML is actually the application of learning several important tags

After the article "This Will Be a Revolution&...

Nginx rtmp module compilation arm version problem

Table of contents 1. Preparation: 2. Source code ...

How to implement remote connection for Redis under Linux

After installing Redis on Linux, use Java to conn...

Use dockercompose to build springboot-mysql-nginx application

In the previous article, we used Docker to build ...