Why is it slow when using limit and offset paging scenarios?

Why is it slow when using limit and offset paging scenarios?

Let’s start with a question

Five years ago when I was at Tencent, I found that MySQL request speed was very slow in paging scenarios. When the data volume is only 10w, select xx from a single machine takes about 2 or 3 seconds.

I asked my master why, and he asked back, "In the index scenario, what is the time complexity of getting the nth largest number in MySQL?"

The search for answers

Confirm the scenario

Assume that there is an index on status. select * from table where status = xx limit 10 offset 10000.

It will be very slow. If the amount of data is not large, there will be a delay of several seconds.

Xiaobai answers

At that time, I felt very safe. My teacher would take care of me no matter what happened. My technical skills were the worst in the group anyway, so I made a blind guess and thought that finding a node would be just log(N). Naturally, my master let me study it on my own.

This stage took 10 minutes.

Continue answering

If you analyze it carefully, you will find that it is awkward to search through the index. Because you don't know the distribution of the first 100 numbers in the left subtree and the right subtree, it is impossible to use the search characteristics of the binary tree.

Through learning, I learned that MySQL's index is a B+ tree.

After looking at this picture, everything became clear. The 100th largest tree can be found directly through the linked list composed of leaf nodes with a complexity of O(n). But even if it is O(n), it is not so slow that it is outrageous. Is there any reason?

During this stage, I mainly searched for information online, which took me 10 days on and off.

System learning

Here are two books recommended. One is "MySQL Technology Insider InnoDB Storage Engine", through which you can have a deeper understanding of InnoDB's implementation mechanism, such as mvcc, index implementation, and file storage.

The second one is "High Performance MySQL", which starts from the usage level, but goes into depth and mentions many design ideas.

By combining the two books and studying them repeatedly, you can barely master MySQL.

There are two key concepts here:

  • Clustered index: contains the primary key index and the corresponding actual data. The leaf node of the index is the data node.
  • Auxiliary index: It can be understood as a secondary node, whose leaf node is also an index node and contains the primary key id.

Even if the first 10,000 will be thrown away, MySQL will use the primary key ID on the secondary index to check the data on the clustered index. This is 10,000 random IOs, so it is naturally as slow as a husky.

You may wonder why this behavior occurs. This is related to the layering of MySQL. Limit offset can only be used for the result set returned by the engine layer. In other words, the engine layer is also innocent, and it does not know that these 10,000 pieces are going to be thrown away.

The following is a diagram of MySQL layering. You can see that the engine layer and the server layer are actually separate.

Until this point, I roughly understood the reason for the slowness. This stage took one year.

comprehend by analogy

I had been working for 3 years at this time and started to look at some source code. After reading etcd, I read some tidb source code. Regardless of the type of database, a query statement is actually composed of logical operators.

Introduction to Logical Operators

Before writing specific optimization rules, let's briefly introduce some logical operators in the query plan.

  • DataSource is the data source, that is, the table, t in select * from t.
  • Selection, for example, select xxx from t where xx = 5, where the filter condition is.
  • Projection, selecting c from t in the query "select c from t" is a projection operation.
  • Join connection, select xx from t1, t2 where t1.c = t2.c means joining the two tables t1 and t2.

Selection, projection, and join (SPJ for short) are the most basic operators. There are many join modes, including inner join, left outer join, right outer join, etc.

After select b from t1, t2 where t1.c = t2.c and t1.a > 5 becomes a logical query plan, the DataSource corresponding to t1 t2 is responsible for retrieving the data.

A Join operator is added above to connect the results of the two tables according to t1.c = t2.c, then a Selection filter is performed according to t1.a > 5, and finally column b is projected.

The following figure is an unoptimized representation:

So it is not that MySQL does not want to pass limit and offset to the engine layer, but because the logical operators are divided, it is impossible to know how much qualified data is contained in the specific operator.

How to solve it

"High Performance MySQL" mentions two solutions

Solution 1

According to the actual business needs, see if it can be replaced with the next page and previous page functions, especially on iOS and Android, where the previous complete paging was not common.

Here, limit and offset are replaced by the auxiliary index (i.e. search condition) id. When the id is called again, it needs to be returned to the front end.

Solution 2

Face it head on. Here is a concept: index coverage: when the data queried by the auxiliary index only includes the ID and the auxiliary index itself, there is no need to query the clustered index.

The idea is as follows: select xxx,xxx from in (select id from table where second_index = xxx limit 10 offset 10000) This sentence means that we first search for the unique database id value corresponding to the data from the conditional query. Because the primary key is already on the secondary index, there is no need to return to the disk of the clustered index to pull it. Then use these 10 limited primary key IDs to query the clustered index. This will only do ten random IOs.

When the business really needs paging, using this solution can greatly improve performance. Usually meets performance requirements.

Final Thoughts

I am very grateful to my master for his guidance and patience in the three years before my graduation. He assigned me reading tasks during holidays, checked on my learning progress during lunch breaks, and guided me to explore problems by asking questions. After I graduated from Tencent, he gave me a lot of advice every time we met, imparted his knowledge and answered my questions, and he did his best in every aspect.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Laravel custom paging implementation example offset() and limit()

<<:  How to use Vue's idea to encapsulate a Storage

>>:  Linux user and group command example analysis [switching, adding users, permission control, etc.]

Recommend

Initial settings after installing Ubuntu 16 in the development environment

The office needs Ubuntu system as the Linux devel...

Detailed examples of the difference between methods watch and computed in Vue.js

Table of contents Preface introduce 1. Mechanism ...

Implementation of MySQL multi-version concurrency control MVCC

Transaction isolation level settings set global t...

Implementation of VUE infinite level tree data structure display

Table of contents Component recursive call Using ...

A brief discussion on read-only and disabled attributes in forms

Read-only and disabled attributes in forms 1. Rea...

Markup Language - Anchor

Previous: Markup Language - Phrase Elements Origin...

How to modify the group to which a user belongs in Linux

Modify the group to which a user belongs in Linux...

What is the base tag and what does it do?

The <base> tag specifies the default addres...

Example code and method of storing arrays in mysql

In many cases, arrays are often used when writing...

Detailed tutorial on building Gitlab server on CentOS8.1

There is no need to say much about the difference...

Detailed explanation of MySql installation and login

Check if MySQL is already installed in Linux sudo...

Summary of common knowledge points required for MySQL

Table of contents Primary key constraint Unique p...

JavaScript to achieve fancy carousel effect

This article shares two methods of implementing t...

Implementation code of jquery step progress axis plug-in

A jQuery plugin every day - step progress axis st...

Detailed explanation of MySQL string concatenation function GROUP_CONCAT

In the previous article, I wrote a cross-table up...