PrefaceIn MySQL, both Innodb and MyIsam use B+ tree as the index structure (other indexes such as hash are not considered here). This article will start with the most common binary search tree, and gradually explain the problems solved by various trees and the new problems they face, thereby explaining why MySQL chooses the B+ tree as the index structure. 1. Binary Search Tree (BST): UnbalancedA binary search tree (BST), also called a binary sorted tree, needs to satisfy the following conditions based on a binary tree: the values of all nodes on the left subtree of any node are not greater than the value of the root node, and the values of all nodes on the right subtree of any node are not less than the value of the root node. The following is a BST: When fast lookup is required, storing data in a BST is a common choice, because the query time depends on the tree height and the average time complexity is O(lgn). However, the BST may grow crooked and become unbalanced, as shown in the following figure. At this time, the BST degenerates into a linked list and the time complexity degenerates to O(n). To solve this problem, a balanced binary tree was introduced. 2. Balanced Binary Tree (AVL): Rotation Time-consumingThe AVL tree is a strictly balanced binary tree. The height difference between the left and right subtrees of all nodes cannot exceed 1. The search, insertion, and deletion of the AVL tree are O(lgn) in both the average and worst cases. The key to achieving balance in AVL lies in the rotation operation: insertion and deletion may destroy the balance of the binary tree. In this case, one or more tree rotations are needed to rebalance the tree. When inserting data, only one rotation (single or double rotation) is required at most; however, when deleting data, the tree will be unbalanced. AVL needs to maintain the balance of all nodes on the path from the deleted node to the root node, and the order of rotation is O(lgn). Due to the time-consuming rotation, the AVL tree is very inefficient when deleting data; when there are many deletion operations, the cost of maintaining balance may be higher than the benefits it brings, so AVL is not widely used in practice. 3. Red-black tree: the tree is too highCompared with AVL trees, red-black trees do not pursue strict balance, but rough balance: they just ensure that the longest possible path from the root to a leaf is no longer than twice the shortest possible path. From the implementation point of view, the biggest feature of the red-black tree is that each node belongs to one of the two colors (red or black), and the division of node colors needs to meet specific rules (the specific rules are omitted). The following are examples of red-black trees: Compared with the AVL tree, the query efficiency of the red-black tree will decrease because the tree is less balanced and has a higher height. However, the deletion efficiency of the red-black tree is greatly improved because the red-black tree also introduces color. When inserting or deleting data, only O(1) rotations and color changes are required to ensure basic balance, without the need for O(lgn) rotations like the AVL tree. In general, the statistical performance of red-black trees is higher than that of AVL. Therefore, in practical applications, AVL trees are relatively rarely used, while red-black trees are very widely used. For example, TreeMap in Java uses red-black trees to store sorted key-value pairs; HashMap in Java8 uses linked lists + red-black trees to solve hash collision problems (when there are fewer conflicting nodes, linked lists are used, and when there are more conflicting nodes, red-black trees are used). For situations where data is in memory (such as the above-mentioned TreeMap and HashMap), the performance of red-black trees is excellent. However, red-black trees are not good at handling data in auxiliary storage devices such as disks (such as databases such as MySQL) because they grow too high. When data is on disk, disk IO will become the biggest performance bottleneck, and the design goal should be to minimize the number of IO times. The higher the tree is, the more IO times are required for adding, deleting, modifying and checking, which will seriously affect performance. 4. B-tree: Born for diskB-tree, also known as B-tree (with no minus sign in it), is a multi-way balanced search tree designed for auxiliary storage devices such as disks. Compared with a binary tree, each non-leaf node of the tree can have multiple subtrees. Therefore, when the total number of nodes is the same, the height of the B-tree is much smaller than that of the AVL tree and the red-black tree (the B-tree is a "short and fat man"), and the number of disk IO times is greatly reduced. The most important concept in defining a B-tree is order. For an m-order B-tree, the following conditions must be met:
It can be seen that the definition of B-tree is mainly to limit the number of child nodes and records of non-leaf nodes. The following figure is an example of a 3-order B-tree: The advantage of B-tree is not only its small tree height, but also its utilization of the principle of locality of access. The so-called locality principle means that when a piece of data is used, the data nearby is more likely to be used in a short time. The B-tree stores data with similar keys in the same node. When a piece of data is accessed, the database reads the entire node into the cache. When the adjacent data is accessed next, it can be read directly from the cache without disk IO. In other words, the B-tree has a higher cache hit rate. B-trees have some applications in databases. For example, MongoDB's index uses the B-tree structure. However, in many database applications, a variant of the B-tree, the B+ tree, is used. 5. B+ TreeB+ tree is also a multi-way balanced search tree. The main differences between it and B tree are:
Therefore, B+ tree has the following advantages over B tree:
B+ trees also have disadvantages: since keys are repeated, they take up more space. However, compared with the performance advantages, the space disadvantage is often acceptable, so B+ trees are more widely used in databases than B trees. 6. Feel the power of B+ treeAs mentioned earlier, the biggest advantage of B-tree/B+ tree over binary trees such as red-black trees is that the tree height is smaller. In fact, for Innodb's B+ index, the tree height is generally 2-4 layers. Let's do some specific estimates. The height of the tree is determined by the order. The larger the order, the shorter the tree. The order depends on how many records each node can store. Each node in Innodb uses a page with a page size of 16KB, of which metadata only occupies about 128 bytes (including file management header information, page header information, etc.), and most of the space is used to store data.
For a 3-layer B+ tree, the first layer (root node) has 1 page, which can store 1000 records; the second layer has 1000 pages, which can store 1000 * 1000 records; the third layer (leaf node) has 1000 * 1000 pages, each page can store 100 records, so it can store 1000 * 1000 * 100 records, or 100 million records. For a binary tree, storing 100 million records requires about 26 layers. VII. ConclusionFinally, let’s summarize the problems solved by various trees and the new problems they face:
The above is the detailed content of the advantages of using B+ tree as the index structure of MySQL. For more information about MySQL B+ tree index structure, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: How to view image information in Docker
>>: XHTML introductory tutorial: Use of list tags
Table of contents Array deduplication 1. from() s...
1. Change the transparency to achieve the gradual...
Preface This article mainly introduces the analys...
Two-column layout is often used in projects. Ther...
1. Download the zip archive version from the offi...
1. Download the installation script - composer-se...
To beautify the table, you can set different bord...
Table of contents 1. Grammar 2. Examples 3. Other...
This article uses an example to describe how to i...
123WORDPRESS.COM provides you with the FileZilla ...
Recently, due to the need to test security produc...
Install mysql5.7 under win, for your reference, t...
1. Overall steps At the beginning, we introduced ...
Html event list General Events: onClick HTML: Mous...
This article describes how to use docker to deplo...