What are the advantages of using B+Tree as an index in MySQL?

What are the advantages of using B+Tree as an index in MySQL?

Why do databases need indexes?

We all know that database data is stored on disk. When our program is started, it is equivalent to a process running in the machine's memory. So when our program wants to query data, it must go out of the memory to the disk to find the data, and then write the data back to the memory. However, the IO efficiency of the disk is far lower than that of the memory, so the speed of searching for data directly affects the efficiency of the program.
The main purpose of adding indexes to a database is to use a suitable data structure, which can make data query more efficient, reduce the number of disk IOs, and increase the speed of data search, rather than a foolish global traversal.

Why does the index use the B+Tree data structure?

If we think simply, if we want to find data quickly, it seems that the hash table is the fastest. According to the key, hash it to a certain slot, and then we can find the location of the data accurately with just one search. How fast is that? However, when we are doing business, we often only need one piece of data. Most of the needs are to query a part of the data based on certain conditions. At this time, hash display is not very suitable.

Let's consider trees, such as binary trees, balanced binary trees, red-black trees, B-trees, etc. They all use binary search and are fast at finding numbers. However, no matter whether it is a balanced binary tree or an optimized red-black tree, they are all binary trees in the final analysis. When there are more nodes, their height will be higher. Let me find some data. If the root node is not there, then I will look for the next layer. If the next layer still doesn’t have the data, I will look for the next layer again. The consequence of this is that I may have to search for a piece of data several times, and each time a disk IO is executed. The purpose of our index is to reduce disk IO, so this design is not acceptable. So can we just reduce the height?
So let's consider the B-tree again. First, let's briefly introduce the data structure of the B-tree:
First, let's look at the definition of B-tree.

  1. Each node has at most m-1 keywords (key-value pairs that can be stored).
  2. The root node can have at least one keyword.
  3. A non-root node has at least m/2 keywords.
  4. The keywords in each node are arranged in ascending order. All keywords in the left subtree of each keyword are smaller than it, and all keywords in the right subtree are larger than it.
  5. All leaf nodes are located in the same layer, or the length from the root node to each leaf node is the same.
  6. Each node stores index and data, that is, the corresponding key and value.

Therefore, the range of the number of keywords for the root node is: 1 <= k <= m-1, and the range of the number of keywords for non-root nodes is: m/2 <= k <= m-1.

Here m represents the order, which indicates how many child nodes a node has at most, so the order of a B-tree needs to be specified when describing it.

Let's take another example to illustrate the above concept. For example, here is a 5-order B-tree with a root node number range of 1 <= k <= 4 and a non-root node number range of 2 <= k <= 4.

Next, we will explain the insertion process of the B-tree through an insertion example, and then explain the process of deleting keywords.

B-Tree Insert

When inserting, we need to remember a rule: determine whether the number of keys of the current node is less than or equal to m-1. If it is satisfied, just insert it directly. If not, use the middle key of the node to divide the node into two parts, and put the middle node into the parent node.

Example: In a 5-order B-tree, a node has a maximum of 4 keys and a minimum of 2 keys (Note: the following nodes are uniformly represented by one node to represent key and value).

Insert 18, 70, 50, 40

Insert 22

When inserting 22, it is found that the keyword of this node is already greater than 4, so it needs to be split. The rules for splitting have been mentioned above. After the split, it is as follows.

Then insert 23, 25, 39

Split and get the following.

Therefore, the number of nodes in each layer of the B-tree will increase. For the same amount of data, the B-tree will be lower than the binary tree, and the number of IO operations required will be reduced, so it meets our indexing requirements. So why did MySQL finally choose B+ tree? How is it better than B tree?
Let's first look at the differences between B+ trees and B trees:

  • The B+ tree leaf nodes contain all the key values ​​of the tree. Non-leaf nodes do not store data, only indexes. Data is stored in leaf nodes. In B-tree, each node stores index and data.
  • Each leaf node of the B+ tree stores pointers to adjacent leaf nodes, and the leaf nodes themselves are linked in ascending order according to the size of the keyword.

As shown in the figure:

First point: When non-leaf nodes only store index keys but not data, the space occupied by non-leaf nodes can be reduced. Nodes with the same capacity can store more indexes. For the same three-layer B+ tree, the number of levels will increase, and it can store more data than the B-tree.
The second point: B+ tree leaf nodes store pointers to adjacent leaf nodes. To understand the benefits of this pointer, we first need to know that when the disk reads data, it is often not strictly read on demand, but it will pre-read every time. Even if only one byte is needed, the disk will start from this position and read a certain length of data backward in sequence into the memory. The theoretical basis for this is the famous locality principle in computer science:

  • When a piece of data is used, the nearby data will usually be used immediately.
  • The data required during program execution is usually concentrated.

The length of the pre-read is generally an integer multiple of a page. A page is a logical block of computer management memory. Hardware and operating systems often divide main memory and disk storage areas into continuous blocks of equal size. Each storage block is called a page (in many operating systems, the page size is usually 4k). Main memory and disk exchange data in units of pages. When the data that the program wants to read is not in the main memory, a page fault exception will be triggered. At this time, the system will send a read signal to the disk. The disk will find the starting position of the data and read one or several pages continuously and load them into the memory. Then the exception will be returned and the program will continue to run.

Now let's look at the pointer of the B+ tree child node, and we understand its use. When reading in advance, it can ensure that the data read continuously is in order.

Some students may have mentioned the B* tree, which is based on the B+ tree and adds linked list pointers for non-leaf nodes. Personally, I think the B-star tree is unnecessary because we don't store data in non-leaf nodes. The data is all in the leaf nodes, and the linked list pointers in non-leaf nodes are not used.

Some fancy concepts

Clustered index and non-clustered index: As mentioned above, the leaf nodes of the B+ tree store the data of the index key, but different MySQL engines have different choices for storing data. MyISAM stores the index file and the actual data file in two files. The data stored in the index file is the address value of the data corresponding to the index key in the data file, while InnoDB stores the formal data in the leaf nodes. Therefore, clustering and non-clustering are to distinguish whether the data stored in the leaf nodes is real (it can be understood as whether the leaf nodes are crowded?)

Returning to the table: Returning to the table is also simple, but you must first understand the primary key index and the ordinary index. The leaf nodes we mentioned above store real data, which is only stored in the primary key index. The data stored in the ordinary index is the key of the primary key index. Then it is easier for us to understand. For example, I have created a normal index for the name field of a table. I want to select * from table where name = 'test'. When we find the test node, the key we get is only the primary key corresponding to this row of data. If we want to get the data of the entire row, we can only use this key to search the primary key index tree again. This operation is called table return.

Leftmost matching principle: When we create a new composite index, such as (name+age), when querying using where name = xx and age = xx, the composite index will be used, while where age = xx and name = xx will not be used. This is because the rule for MySQL to create a joint index is to first sort the leftmost field of the joint index, and then sort the second field based on the sorting of the first field.

The above is the detailed content of the advantages of using B+Tree as index in MySQL. For more information about the advantages of using B+Tree as index in MySQL, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • What are the advantages of using B+ tree index in MySQL?
  • What are the benefits of using B+ tree as index structure in MySQL?
  • Why does MySQL database index choose to use B+ tree?
  • How to get the height of MySQL innodb B+tree
  • Detailed explanation of the difference between MySQL normal index and unique index
  • A brief discussion on which fields in Mysql are suitable for indexing
  • MySQL uses covering index to avoid table return and optimize query

<<:  Is it necessary to give alt attribute to img image tag?

>>:  Web Design Teaching or Learning Program

Recommend

Detailed steps to change the default password when installing MySQL in Ubuntu

Step 1: Enter the directory: cd /etc/mysql, view ...

Analysis of the principle of centering elements with CSS

It is a very common requirement to set the horizo...

Using docker command does not require sudo

Because the docker daemon needs to bind to the ho...

MySQL database query performance optimization strategy

Optimize queries Use the Explain statement to ana...

Introduction to Linux compression and decompression commands

Table of contents Common compression formats: gz ...

CSS3 realizes the red envelope shaking effect

There is a requirement to realize the shaking eff...

Introduction to local components in Vue

In Vue, we can define (register) local components...

Installation tutorial of MySQL 5.1 and 5.7 under Linux

The operating system for the following content is...

Automatic file synchronization between two Linux servers

When server B (172.17.166.11) is powered on or re...

Common date comparison and calculation functions in MySQL

Implementation of time comparison in MySql unix_t...

Data storage implementation method in WeChat applet

Table of contents Global variable globalData Page...

javascript input image upload and preview, FileReader preview image

FileReader is an important API for front-end file...

20 CSS coding tips to make you more efficient (sorted)

In this article, we would like to share with you ...