Analysis of the reasons why MySQL's index system uses B+ tree

Analysis of the reasons why MySQL's index system uses B+ tree

1. What is an index?

An index is a decentralized storage structure created to speed up the retrieval of data rows in a table. (Just like the dictionary we used when we were young, it would be faster to find the corresponding word with a dictionary)

2. Why do we need indexes?

First we need to understand some concepts and knowledge

  1. Where is mysql data stored? ----disk
  2. Where is the problem usually when querying data is slow? ----IO
  3. (So ​​we need to improve the efficiency of IO, so how to improve it? ---- Two levels of frequency and quantity. For example, the effort consumed in moving bricks once and moving them ten times is different. The effort consumed in moving one brick at a time and moving ten bricks at a time (occupying IO resources) is also different. So we try to reduce the interaction with IO as much as possible while meeting our own needs)
  4. When reading data from the disk, do you read as much as you need? ----Disk pre-reading
  5. Disk pre-reading: When data is exchanged between memory and disk, there is usually a smallest logical unit, called a page, datapage. The size of a page is generally determined by the operating system, usually 4k or 8k. When we exchange data, we can read data in integer multiples of the page. The InnoDB storage engine reads 16k of data each time.
  6. Locality principle: Data and programs tend to cluster together, and previously accessed data may be queried again, involving spatial locality and temporal locality.

Through the above concepts, we roughly know what the index is used for - design the index system in advance, and when we query data, reduce the interaction with IO to improve our query efficiency.

3. How to design an index system?

Let’s first understand a few concepts

  • Where are indexes stored? ---- Disk , when querying data, the index will be loaded into memory first
  • What information does the index need when storing? What field values ​​need to be stored?

—— key : the value stored in the actual data row - file address (pointer, we need to rely on the file address to find where the data file is stored)
—— offset : offset (if we want to get a piece of data in the file, we need to use the offset)

  • What kind of data structure should be used to store the data in the format mentioned above?

—— From the above, we can see that our data format is of KV type. Knowing the KV format data, we know what data structure to use to store it, including hash table , tree ( binary tree , binary search tree , binary balanced tree , red-black tree , B tree , B+ tree )
To sum up, we can use the above data structure to design our index system

4.What is the MYSQL index system?

Why not store it in the format mentioned above?

As we all know, MySQL's index system uses B+ tree . Why is it B+ tree? Next, we analyze why other storage structures don’t work one by one. Before that, we still need to understand two prerequisites: OLAP and OLTP

The more data we store, the larger the corresponding index will be. When we read from disk to memory, IO problems will arise. So do we create indexes for indexes? No, so MySQL uses the B+ tree

5. Hash Table

insert image description here

The above is the storage structure of the hash table. Let's discuss the advantages and disadvantages of this type of storage structure:

  • Hash collisions will cause uneven data hashing and generate a large number of linear queries, which is time-consuming.
  • Range queries are not supported . When performing range queries, you must traverse each
  • The memory space requirement is relatively high (all data must be added to the memory)

advantage:
If it is an equal value query, it will be very fast

So is there a hash index in MySQL?

  • The memory storage engine uses a hash index.
  • InnoDB supports adaptive hashing

6. Tree

6.1 Binary Tree

insert image description here

The binary tree itself is unordered. When we search for data, we have to compare the data with each node one by one to see if it meets our data requirements, which is inefficient.

6.2 Binary Search Tree (BST)

insert image description here

Characteristics of binary search tree: data must be inserted in order, the left subtree must be smaller than the root node, and the right subtree must be guaranteed to be larger than the root node. Therefore, using a binary search tree compared to a binary tree obviously improves query efficiency.
However, if data is inserted in increasing or decreasing order, the binary search tree will degenerate into a linked list , and the search efficiency will be reduced.

insert image description here

6.3 Balanced Binary Tree (AVL Tree)

insert image description here

According to the problems exposed by the binary search tree, we use the AVL tree to balance the tree by left or right rotation. However, in order to ensure balance, rotation is necessary when inserting data, compensating for the improvement in query performance by the loss in insertion performance . It's fine if I read more and write less, but if I have the same number of read and write requests, it's not suitable.

6.4 Red-Black Trees

insert image description here

The red-black tree is also balanced by left and right rotations, and there is also color-changing behavior. The longest subtree only needs to be no longer than twice the shortest subtree...so the query performance and insertion performance can be approximately balanced . However, as data is inserted, it is found that the depth of the tree will become deeper. The deeper the depth, the more IO times it will have, which affects the efficiency of data reading.

6.5 B-Trees

In view of the problems exposed by the red-black tree, how should we improve the efficiency of reading? Can we change from an ordered binary tree to an ordered multi-branch tree so that we can store more data?

insert image description here

A degree of 4 means that a node stores three data values, and any values ​​exceeding that must be transformed. So how is the actual data stored? We need the key and the complete data row

insert image description here

The above picture shows how the B-tree actually stores data. Each node has three elements: key , pointer , and data .
For example, if I want to find the data 28, I first start from disk block 1 and find that it cannot be read. After comparing the range to disk block 3 pointed to by the p2 pointer, it still cannot be found. Then I find 28 according to the p2 pointer of disk block 3 pointing to disk block 8. Let's analyze it. Each disk block is 16kb in size . We only need to read 48kb to search three disk blocks. How many records can a three-layer B-tree store ?

Let's idealize it and assume that keys and pointers do not take up space, and one piece of data takes up 1k of space. Then Disk 1 can store 16 pieces of data, Disk 3 also has 16 pieces of data, and Disk 8 also has 16 pieces of data. In this case, we can only store 16 + 16 + 16 = 4096 records, which is obviously a bit too little. In fact, keys and pointers also take up space.

So we can't help but wonder, why is the amount of stored data so small?
We find that the size of each layer of storage is occupied by data, so can we only store keys and pointers? For this reason, the B+ tree is introduced

6.6 B+Tree

insert image description here

The evolution from B-tree to B+tree: non-leaf nodes do not store data, only leaf nodes store data

insert image description here

In the above figure, we can assume that p1 and 28 are a group of 10 bytes , so the first layer can store 16000/10=1600 such size, the second layer is also 1600, and the third layer data occupies 1kb, which is 16 records, so the total storage is 1600 1600 16=40960000 ( 40.96 million ) records

The MySQL index structure is generally 3 to 4 layers, but there is one issue that needs attention. Assuming we have a three-layer storage structure, how can we store more data?
We just assumed that p1 and 28 are 10 bytes in size. What if they are 1 byte? Then the total storage capacity is 16000 16000 10=4096000000. So this leads to the question that is always asked in interviews: is it better to use int or var to create indexes?

Answer: The shorter the key length, the better. For varchar with a length less than 4 bytes, use varchar; for varchar with a length greater than 4 bytes, use int.

According to the characteristics of B+ tree, it has large storage capacity and fast query, so MySQL uses B+ tree.

Summarize

This concludes the explanation of why the MySQL index system uses B+ trees. If I have said anything wrong, I hope you can remind me and correct it.

This concludes this article on why MySQL's index system uses B+ trees. For more information about MySQL index B+ trees, please search for previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Why does MySQL database index choose to use B+ tree?
  • What are the benefits of using B+ tree as index structure in MySQL?
  • What are the advantages of using B+ tree index in MySQL?
  • The reason why MySQL uses B+ tree as its underlying data structure
  • Detailed explanation of the difference between B-tree index and B+ tree index in MySQL
  • Detailed explanation of MySQL B+ tree index and hash index
  • An article to understand why the MySQL index data structure uses B+ tree

<<:  An example of vertical centering of sub-elements in div using Flex layout

>>:  How to install Elasticsearch7.6 cluster in docker and set password

Recommend

Detailed explanation of virtual DOM and diff algorithm in react

The role of virtual DOM First of all, we need to ...

Docker builds python Flask+ nginx+uwsgi container

Install Nginx First pull the centos image docker ...

Mini Program Development to Implement Unified Management of Access_Token

Table of contents TOKEN Timer Refresher 2. Intern...

Specific use of pthread_create in linux to create threads

pthread_create function Function Introduction pth...

Vue uses echarts to draw an organizational chart

Yesterday, I wrote a blog about the circular prog...

Summary of considerations for writing web front-end code

1. It is best to add a sentence like this before t...

Detailed installation and use of SSH in Ubuntu environment

SSH stands for Secure Shell, which is a secure tr...

How to control the startup order of docker compose services

summary Docker-compose can easily combine multipl...

Vue ElementUI implements asynchronous loading tree

This article example shares the specific code of ...

Introduction to basic concepts and technologies used in Web development

Today, this article introduces some basic concept...

Linux centOS installation JDK and Tomcat tutorial

First download JDK. Here we use jdk-8u181-linux-x...