Let’s start with a question Five years ago when I was at Tencent, I found that MySQL request speed was very slow in paging scenarios. When the data volume is only 10w, select xx from a single machine takes about 2 or 3 seconds. I asked my master why, and he asked back, "In the index scenario, what is the time complexity of getting the nth largest number in MySQL?" The search for answers Confirm the scenario Assume that there is an index on status. select * from table where status = xx limit 10 offset 10000. It will be very slow. If the amount of data is not large, there will be a delay of several seconds. Xiaobai answers At that time, I felt very safe. My teacher would take care of me no matter what happened. My technical skills were the worst in the group anyway, so I made a blind guess and thought that finding a node would be just log(N). Naturally, my master let me study it on my own. This stage took 10 minutes. Continue answering If you analyze it carefully, you will find that it is awkward to search through the index. Because you don't know the distribution of the first 100 numbers in the left subtree and the right subtree, it is impossible to use the search characteristics of the binary tree. Through learning, I learned that MySQL's index is a B+ tree. After looking at this picture, everything became clear. The 100th largest tree can be found directly through the linked list composed of leaf nodes with a complexity of O(n). But even if it is O(n), it is not so slow that it is outrageous. Is there any reason? During this stage, I mainly searched for information online, which took me 10 days on and off. System learning Here are two books recommended. One is "MySQL Technology Insider InnoDB Storage Engine", through which you can have a deeper understanding of InnoDB's implementation mechanism, such as mvcc, index implementation, and file storage. The second one is "High Performance MySQL", which starts from the usage level, but goes into depth and mentions many design ideas. By combining the two books and studying them repeatedly, you can barely master MySQL. There are two key concepts here:
Even if the first 10,000 will be thrown away, MySQL will use the primary key ID on the secondary index to check the data on the clustered index. This is 10,000 random IOs, so it is naturally as slow as a husky. You may wonder why this behavior occurs. This is related to the layering of MySQL. Limit offset can only be used for the result set returned by the engine layer. In other words, the engine layer is also innocent, and it does not know that these 10,000 pieces are going to be thrown away. The following is a diagram of MySQL layering. You can see that the engine layer and the server layer are actually separate. Until this point, I roughly understood the reason for the slowness. This stage took one year. comprehend by analogy I had been working for 3 years at this time and started to look at some source code. After reading etcd, I read some tidb source code. Regardless of the type of database, a query statement is actually composed of logical operators. Introduction to Logical Operators Before writing specific optimization rules, let's briefly introduce some logical operators in the query plan.
Selection, projection, and join (SPJ for short) are the most basic operators. There are many join modes, including inner join, left outer join, right outer join, etc. After select b from t1, t2 where t1.c = t2.c and t1.a > 5 becomes a logical query plan, the DataSource corresponding to t1 t2 is responsible for retrieving the data. A Join operator is added above to connect the results of the two tables according to t1.c = t2.c, then a Selection filter is performed according to t1.a > 5, and finally column b is projected. The following figure is an unoptimized representation: So it is not that MySQL does not want to pass limit and offset to the engine layer, but because the logical operators are divided, it is impossible to know how much qualified data is contained in the specific operator. How to solve it "High Performance MySQL" mentions two solutions Solution 1 According to the actual business needs, see if it can be replaced with the next page and previous page functions, especially on iOS and Android, where the previous complete paging was not common. Here, limit and offset are replaced by the auxiliary index (i.e. search condition) id. When the id is called again, it needs to be returned to the front end. Solution 2 Face it head on. Here is a concept: index coverage: when the data queried by the auxiliary index only includes the ID and the auxiliary index itself, there is no need to query the clustered index. The idea is as follows: select xxx,xxx from in (select id from table where second_index = xxx limit 10 offset 10000) This sentence means that we first search for the unique database id value corresponding to the data from the conditional query. Because the primary key is already on the secondary index, there is no need to return to the disk of the clustered index to pull it. Then use these 10 limited primary key IDs to query the clustered index. This will only do ten random IOs. When the business really needs paging, using this solution can greatly improve performance. Usually meets performance requirements. Final Thoughts I am very grateful to my master for his guidance and patience in the three years before my graduation. He assigned me reading tasks during holidays, checked on my learning progress during lunch breaks, and guided me to explore problems by asking questions. After I graduated from Tencent, he gave me a lot of advice every time we met, imparted his knowledge and answered my questions, and he did his best in every aspect. The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: How to use Vue's idea to encapsulate a Storage
The office needs Ubuntu system as the Linux devel...
Table of contents Preface introduce 1. Mechanism ...
Transaction isolation level settings set global t...
Table of contents Component recursive call Using ...
Read-only and disabled attributes in forms 1. Rea...
Previous: Markup Language - Phrase Elements Origin...
Modify the group to which a user belongs in Linux...
The <base> tag specifies the default addres...
In many cases, arrays are often used when writing...
There is no need to say much about the difference...
Check if MySQL is already installed in Linux sudo...
Table of contents Primary key constraint Unique p...
This article shares two methods of implementing t...
A jQuery plugin every day - step progress axis st...
In the previous article, I wrote a cross-table up...