We all know that the underlying data structure of MySQL is B+ tree, so why not use red-black tree or other data structures? The red-black tree is a self-balancing binary search tree. The hashmap in Java8 uses the red-black tree to optimize its query efficiency. It can be seen that the query efficiency of the red-black tree is still relatively high. But why does MySQL use B+ trees instead of red-black trees at the bottom layer? The following figure shows the situation after the red-black tree is inserted into 1, 2, 3, 4, 5, and 6 in sequence: Then insert 7 into the red-black tree above: It can be seen that although the red-black tree has been self-balanced, the data as a whole is still biased towards the right side of the tree. If more data is added, after millions or tens of millions of data are added, the level of the tree will be very high. When querying, each additional layer will require one more IO. When the number of tree levels increases, the search efficiency will be very slow. At this time, someone may ask, why not use the more balanced AVL tree? The AVL tree after inserting 1,2,3,4,5,6,7 at once looks like this: It does look much better and the number of tree layers has decreased, but AVL still does not solve the fundamental problem. When the amount of data reaches millions or tens of millions, the number of tree layers will still be relatively large. Not to mention the cost of maintaining the balance of the AVL tree, the number of AVL tree layers alone cannot meet our requirements. So what kind of data structure can keep the number of layers small when the data volume reaches millions, tens of millions, or even larger? Obviously, if we want to reduce the number of layers, we must store more data in each layer. No matter how well balanced a binary tree is, it can only have two forks at each node. The amount of data in each layer is limited by the data structure, so we cannot choose from a binary tree. So at this time, the advantage of B-tree is reflected. Each node of B-tree can store multiple elements, and each element can have a fork. The following figure shows that each node of B-tree can store up to 3 elements: It can be seen that the tree level is reduced to two levels. If the maximum number of elements that can be stored in each node at a time is large enough, then even if the amount of data reaches tens of millions, the tree level can be controlled within an acceptable range. But there is another problem with B-tree. The following figure shows the situation when the B-tree level reaches three levels: If I need to take out elements 5-10 now, when I find element 5 through layer-by-layer query, and then find that other elements are not in this node, I need to query other elements through local in-order traversal. After finding 7, I need to do the same operation to find 8, 9, and 10, which will increase the number of IO times, so there is a B+ tree. B+ tree is an optimization of B tree, mainly from two aspects: The first optimization is to add a bidirectional pointer between each leaf node, pointing to the adjacent nodes. This solves the range query problem mentioned above. If the range query spans multiple nodes, the adjacent nodes can be quickly found through this bidirectional pointer without the need for local in-order traversal, thereby reducing the number of IO times. The following figure demonstrates the B+ tree: But what if the element you are looking for is not in the leaf node? Don't worry, another optimization of the B+ tree is that the leaf nodes contain all the elements of the tree! The non-leaf nodes of the B+ tree no longer store the data or pointers of the elements, but only serve as redundant indexes to form a complete B+ tree for easy query. It can be seen that element 15 in the above figure exists not only in non-leaf nodes, but also in leaf nodes. Although this design brings a lot of redundant indexes, it eliminates the need to search upwards for non-leaf nodes when performing range queries. In addition, the number of indexes that can be stored at each level increases, allowing the database to query more index elements each time it performs IO. After all, under normal circumstances, the space occupied by data is much larger than that occupied by indexes. (It should be noted that although both InnoDB and MyISAM engines use B+ trees, InnoDB's clustered index and data are stored together, while MyISAM stores the clustered index and the corresponding data pointer together, and the index and data are separate. The B+ tree under the MyISAM engine also only stores data pointers in leaf nodes.) From the above analysis, we can know that the reason why B+ tree is chosen as the underlying layer of MySQL is to reduce the number of IO operations. So why don’t we go to the extreme and use hash to save data or indexes? In fact, MySQL does support hash type indexes. However, hash indexes are generally not used, mainly because hash indexes store hash codes, and the storage order has nothing to do with the value size of the index column. Therefore, hash indexes are only effective when performing exact searches, and a full table scan will be performed for range queries. At the same time, if the amount of data in the table is very large, the number of hash collisions will increase, and the efficiency of a single search may not be higher than that of the B+ tree. To summarize briefly, compared with other trees, each node of the B+ tree can store more elements, which can greatly reduce the number of IO times required for queries. The design of non-leaf nodes not storing data or pointers can increase the number of elements stored in each node, and the bidirectional pointers of leaf nodes can improve the efficiency of range queries. This concludes this article on why MySQL uses B+ tree as the underlying data structure. For more information about MySQL B+ tree, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: A brief discussion on the solution to the problem of native page compatibility with IE9
>>: Detailed explanation of how to use Vue+element to implement the tag at the top of the page
In the past, when I needed the border length to b...
MySQL is a multi-user managed database that can a...
1. Write a simple Java program public class tests...
Recently, when using Apple.com/Ebay.com/Amazon.co...
Although the frequency of starting the shell is v...
Table of contents 1. Directive custom directive 2...
CSS3 achieves cool 3D rotation perspective 3D ani...
1. Installation of MYSQL 1. Open the downloaded M...
Tomcat is widely known as a web container. It has...
Preface I have always wanted to know how a SQL st...
Table of contents Array deduplication 1. from() s...
This article shares the installation and configur...
> Deploy MySQL 5.7 cluster master & slave ...
Copy code The code is as follows: Difference betw...
This article uses examples to describe the add, d...