Detailed introduction to MySQL database index

Detailed introduction to MySQL database index

If you want to understand why MySQL can retrieve data quickly, then you must understand the index principle of MySQL.

Mind Map

Simple understanding

You can think of the index as the table of contents of a book. We can use the index to quickly find the data we need, which is roughly like the picture below. The index is like the binary tree on the right. Each node points to the physical address of specific data. First, find the location of the data through the binary tree, and then get the data from the physical disk.

However, different binary trees have different characteristics, and we also need to choose a suitable tree as an index, so let's learn about the characteristics of each tree.

Evolution of indexing models

Binary Search Tree

A binary search tree is based on an array and uses the binary search technique to use the intermediate nodes as pointers. In this way, the value of the left subtree of each node is less than the value of the node, and the value of the right subtree of each node is greater than the value of the node. When searching for an element, we can remove nearly half of the search range each time after comparing it with the root node, which can greatly speed up the search.

advantage:

Easy to insert, no need to arrange in series

Using the tree's unique features, it is very convenient to search

shortcoming:

If the maximum value is inserted every time, it will become a linked list, and the search complexity will increase.

The more elements you insert, the higher the tree will be, resulting in poor query performance.

Self-balancing binary tree

Compared to a binary tree, a self-balancing binary tree ensures that the height difference between the left subtree and the right subtree does not exceed one by rotating left or right. This solves the problem of turning a binary search tree into a linked list.

However, if there are more elements, the height of the tree can easily become very high, which will slow down the query efficiency. In order to solve this problem, B-tree came into being.

B-Tree

The biggest difference of B-tree is that it is no longer limited to having only one node, but allows multiple nodes, which is a multi-branch tree. And all leaf nodes of the B-tree must be at the same level, that is, they have the same depth

For example, if a B-Tree with degree d indexes N keys, then the upper limit of its tree height h is logn(N/2). The asymptotic complexity of searching the number of nodes for a key is O(logn((N+1)/2)). From this point we can see that B-Tree is a very efficient index data structure.

Locality Principle

This multi-node structure can also make good use of the disk pre-reading feature.

Due to the characteristics of storage media, disk access itself is much slower than main memory. In addition to the mechanical movement consumption, the disk access speed is often a few hundredths of the main memory. Therefore, in order to improve efficiency, disk I/O should be minimized. To achieve this goal, the disk is often not read strictly on demand, but will pre-read each time. Even if only one byte is needed, the disk will start from this position and read a certain length of data sequentially backwards into the memory. The theoretical basis for this is the famous locality principle in computer science: when a piece of data is used, the nearby data will usually be used immediately. The data required during program execution is usually concentrated. Since disk sequential reads are very efficient (no seek time is required and only a small amount of rotation time is required), pre-reading can improve I/O efficiency for programs with locality.

In a B-tree, the size of a node is set equal to a page so that each node can be fully loaded with only one I/O. To achieve this goal, the following techniques are required in the actual implementation of B-tree: <br /> Each time a new node is created, a page of space is directly requested. This ensures that a node is physically stored in a page. In addition, computer storage allocation is aligned by page, so that a node only requires one I/O.

However, each node of the B-tree contains data (index + record), and the size of the user's record data is likely to far exceed the index data, which requires more disk I/O operations to read "useful index data." Moreover, when we query a node at the bottom layer (such as A record), the record data in the "non-A record node" will be loaded from the disk to the memory, but this record data is useless. We just want to read the index data of these nodes for comparison queries, and the record data in the "non-A record node" is useless to us. This not only increases the number of disk I/O operations, but also occupies memory resources.

B+ Tree

MySQL generally uses B+ tree to implement its index structure. Compared with B tree, B+ tree has the following differences

Leaf nodes (bottom-most nodes) store actual data (index + record), while non-leaf nodes only store indexes;

All indexes will appear in leaf nodes, and the leaf nodes form an ordered linked list;

The index of non-leaf nodes also exists in the child nodes and is the maximum (or minimum) of all indexes in the child nodes.

There are as many indexes as there are child nodes in a non-leaf node;

The non-leaf nodes of the B+ tree do not store actual record data, but only indexes. Therefore, when the amount of data is the same, compared with the B-tree that stores both indexes and records, the non-leaf nodes of the B+ tree can store more indexes. Therefore, the B+ tree can be "shorter and fatter" than the B-tree, and the number of disk I/O times for querying the underlying nodes will be less.

As a multi-branch tree, B+ does not cause complex tree deformation when deleting or inserting nodes even if there are a large number of redundant nodes.

In the database, optimization is also performed based on the B+ tree, and sequential access pointers are added. The purpose of this optimization is to improve the performance of interval access. For example, if you want to query all data records with keys from 18 to 49, after finding 18, you only need to traverse the nodes and pointers in order to access all data nodes at once, which greatly improves the efficiency of interval query. <br />The B-tree does not have a structure that connects all leaf nodes in series with a linked list, so range queries can only be completed by traversing the tree, which involves disk I/O operations on multiple nodes. The range query efficiency is not as good as that of the B+ tree. Therefore, B+ trees are suitable for scenarios with a large number of range retrievals, such as databases. For scenarios with a large number of single index queries, you can consider B-tree, such as NoSQL's MongoDB.

In MySQL, the leaf nodes of the B+ tree are connected by a "bidirectional linked list", which has the advantage that it can be traversed both to the right and to the left.

Clustered index and secondary index

Clustered index (primary key index): puts data and index together. The leaf nodes of the index structure store row data. Finding the index also finds the data.

Secondary index (non-primary key index): Store data and index separately. The leaf nodes of the index structure store the primary key value.

When InnoDB creates a clustered index, it selects different columns as indexes based on different scenarios:

If there is a primary key, it will be used as the index key of the clustered index by default;

If there is no primary key, select the first unique column that does not contain NULL values ​​as the index key of the clustered index;

In the absence of the above two cases, InnoDB will automatically generate an implicit auto-increment id column as the index key of the clustered index;

Because the data in the table is stored in the leaf nodes of the clustered index, the InnoDB storage engine will definitely create a clustered index for the table. And because only one copy of the data is physically saved, there can only be one clustered index, but multiple secondary indexes can be created.

For example, the (ID, k) values ​​in the figure are (100, 1), (200, 2), (300, 3), (500, 5), and (600, 6)

The difference when querying:

If the statement is select * from T where ID=500, that is, the primary key query method, then only the B+ tree of ID needs to be searched;

If the statement is select * from T where k=5, that is, the normal index query method, you need to first search the k index tree to get the ID value of 500, and then search the ID index tree again. This process is called table return .

In other words, queries based on non-primary key indexes need to scan one more index tree. Therefore, we should try to use primary key queries in our applications.

Summarize

This is the end of this article about the detailed introduction of MySQL database index. For more relevant MySQL index content, please search for previous articles on 123WORDPRESS.COM or continue to browse the related articles below. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to construct a table index in MySQL
  • How to maintain MySQL indexes and data tables
  • Detailed explanation of MySQL database index
  • MySQL Data Optimization - Multi-layer Index
  • Details of the underlying data structure of MySQL indexes
  • MySQL Database Indexes and Transactions
  • Detailed explanation of the principles of indexing MySQL tables

<<:  Gojs implements ant line animation effect

>>:  What are HTML inline elements and block-level elements and their differences

Recommend

CentOS 7 installation and configuration tutorial under VMware10

If Ubuntu is the most popular Linux operating sys...

Linux implements automatic and scheduled backup of MySQL database every day

Overview Backup is the basis of disaster recovery...

7 Ways to Write a Vue v-for Loop

Table of contents 1. Always use key in v-for loop...

How to add shortcut commands in Xshell

As a useful terminal emulator, Xshell is often us...

Docker starts in Exited state

After docker run, the status is always Exited Sol...

Example method of deploying react project on nginx

Test project: react-demo Clone your react-demo pr...

Detailed explanation of data type issues in JS array index detection

When I was writing a WeChat applet project, there...

A brief discussion on the fun of :focus-within in CSS

I believe some people have seen this picture of c...

Two ideas for implementing database horizontal segmentation

introduction With the widespread popularity of In...

Example of how to change the line spacing of HTML table

When using HTML tables, we sometimes need to chan...

Steps to install MySQL 5.7.10 on Windows server 2008 r2

Install using the MSI installation package Downlo...

Detailed explanation of Vue filter implementation and application scenarios

1. Brief Introduction Vue.js allows you to define...