Why do databases need indexes? We all know that database data is stored on disk. When our program is started, it is equivalent to a process running in the machine's memory. So when our program wants to query data, it must go out of the memory to the disk to find the data, and then write the data back to the memory. However, the IO efficiency of the disk is far lower than that of the memory, so the speed of searching for data directly affects the efficiency of the program. Why does the index use the B+Tree data structure?If we think simply, if we want to find data quickly, it seems that the hash table is the fastest. According to the key, hash it to a certain slot, and then we can find the location of the data accurately with just one search. How fast is that? However, when we are doing business, we often only need one piece of data. Most of the needs are to query a part of the data based on certain conditions. At this time, hash display is not very suitable. Let's consider trees, such as binary trees, balanced binary trees, red-black trees, B-trees, etc. They all use binary search and are fast at finding numbers. However, no matter whether it is a balanced binary tree or an optimized red-black tree, they are all binary trees in the final analysis. When there are more nodes, their height will be higher. Let me find some data. If the root node is not there, then I will look for the next layer. If the next layer still doesn’t have the data, I will look for the next layer again. The consequence of this is that I may have to search for a piece of data several times, and each time a disk IO is executed. The purpose of our index is to reduce disk IO, so this design is not acceptable. So can we just reduce the height?
Therefore, the range of the number of keywords for the root node is: 1 <= k <= m-1, and the range of the number of keywords for non-root nodes is: m/2 <= k <= m-1. Here m represents the order, which indicates how many child nodes a node has at most, so the order of a B-tree needs to be specified when describing it. Let's take another example to illustrate the above concept. For example, here is a 5-order B-tree with a root node number range of 1 <= k <= 4 and a non-root node number range of 2 <= k <= 4. Next, we will explain the insertion process of the B-tree through an insertion example, and then explain the process of deleting keywords. B-Tree InsertWhen inserting, we need to remember a rule: determine whether the number of keys of the current node is less than or equal to m-1. If it is satisfied, just insert it directly. If not, use the middle key of the node to divide the node into two parts, and put the middle node into the parent node. Example: In a 5-order B-tree, a node has a maximum of 4 keys and a minimum of 2 keys (Note: the following nodes are uniformly represented by one node to represent key and value). Insert 18, 70, 50, 40 Insert 22 When inserting 22, it is found that the keyword of this node is already greater than 4, so it needs to be split. The rules for splitting have been mentioned above. After the split, it is as follows. Then insert 23, 25, 39 Split and get the following. Therefore, the number of nodes in each layer of the B-tree will increase. For the same amount of data, the B-tree will be lower than the binary tree, and the number of IO operations required will be reduced, so it meets our indexing requirements. So why did MySQL finally choose B+ tree? How is it better than B tree?
As shown in the figure: First point: When non-leaf nodes only store index keys but not data, the space occupied by non-leaf nodes can be reduced. Nodes with the same capacity can store more indexes. For the same three-layer B+ tree, the number of levels will increase, and it can store more data than the B-tree.
The length of the pre-read is generally an integer multiple of a page. A page is a logical block of computer management memory. Hardware and operating systems often divide main memory and disk storage areas into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data in units of pages. When the data that the program wants to read is not in the main memory, a page fault exception will be triggered. At this time, the system will send a read signal to the disk. The disk will find the starting position of the data and read one or several pages continuously and load them into the memory. Then the exception will be returned and the program will continue to run. Now let's look at the pointer of the B+ tree child node, and we understand its use. When reading in advance, it can ensure that the data read continuously is in order. Some students may have mentioned the B* tree, which is based on the B+ tree and adds linked list pointers for non-leaf nodes. Personally, I think the B-star tree is unnecessary because we don't store data in non-leaf nodes. The data is all in the leaf nodes, and the linked list pointers in non-leaf nodes are not used. Some fancy conceptsClustered index and non-clustered index: As mentioned above, the leaf nodes of the B+ tree store the data of the index key, but different MySQL engines have different choices for storing data. MyISAM stores the index file and the actual data file in two files. The data stored in the index file is the address value of the data corresponding to the index key in the data file, while InnoDB stores the formal data in the leaf nodes. Therefore, clustering and non-clustering are to distinguish whether the data stored in the leaf nodes is real (it can be understood as whether the leaf nodes are crowded?) Returning to the table: Returning to the table is also simple, but you must first understand the primary key index and the ordinary index. The leaf nodes we mentioned above store real data, which is only stored in the primary key index. The data stored in the ordinary index is the key of the primary key index. Then it is easier for us to understand. For example, I have created a normal index for the name field of a table. I want to select * from table where name = 'test'. When we find the test node, the key we get is only the primary key corresponding to this row of data. If we want to get the data of the entire row, we can only use this key to search the primary key index tree again. This operation is called table return. Leftmost matching principle: When we create a new composite index, such as (name+age), when querying using where name = xx and age = xx, the composite index will be used, while where age = xx and name = xx will not be used. This is because the rule for MySQL to create a joint index is to first sort the leftmost field of the joint index, and then sort the second field based on the sorting of the first field. The above is the detailed content of the advantages of using B+Tree as index in MySQL. For more information about the advantages of using B+Tree as index in MySQL, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Is it necessary to give alt attribute to img image tag?
>>: Web Design Teaching or Learning Program
Step 1: Enter the directory: cd /etc/mysql, view ...
It is a very common requirement to set the horizo...
Reference Documentation Official Docker installat...
Because the docker daemon needs to bind to the ho...
Optimize queries Use the Explain statement to ana...
Table of contents Common compression formats: gz ...
There is a requirement to realize the shaking eff...
In Vue, we can define (register) local components...
The operating system for the following content is...
When server B (172.17.166.11) is powered on or re...
Table of contents I've been learning React re...
Implementation of time comparison in MySql unix_t...
Table of contents Global variable globalData Page...
FileReader is an important API for front-end file...
In this article, we would like to share with you ...