In-depth analysis of MySQL index data structure

In-depth analysis of MySQL index data structure

Overview

An index is a structure that sorts the values ​​of one or more columns in a database table. An index can be used to quickly access specific information in a database table.

Index data structure

Binary Tree

A binary tree is an ordered tree in which the degree of the nodes in the tree is no more than 2. It is the simplest and most important tree. The recursive definition of a binary tree is: a binary tree is an empty tree, or a non-empty tree consisting of a root node and two non-intersecting left and right subtrees of the root; the left and right subtrees are also binary trees.

For the array {1,2,3,4,5} the data structure will become a linked list

Features:

  • There are two child nodes under the parent node.
  • The data of the right node is greater than the data of the left node.


Binary tree.png

Red-Black Tree

A red-black tree is a specific type of binary tree, which is a structure used in computer science to organize blocks of data such as numbers. If a binary search tree is a red-black tree, then any of its subtrees must be a red-black tree.

The red-black tree is a variant of a balanced binary search tree. The height difference between its left and right subtrees may be greater than 1, so the red-black tree is not a strictly balanced binary tree (AVL), but the cost of balancing it is low, and its average statistical performance is stronger than AVL.

Since each red-black tree is a binary sorted tree, when searching a red-black tree, the search algorithm applied to an ordinary binary sorted tree can be used, and color information is not required during the search process.

The red-black tree data structure is as follows:


Red-black tree data structure.png

Features:

  • A red-black tree is a binary search tree in which each node has a color attribute, either red or black.
  • The nodes are red or black.
  • The root node is black.
  • All leaves are black. (Leaves are NIL nodes)
  • Every red node has two black child nodes. (There cannot be two consecutive red nodes on any path from each leaf to the root)
  • All paths from any node to each leaf contain the same number of black nodes.
  • These constraints enforce a key property of red-black trees: the longest possible path from the root to a leaf is no longer than twice as long as the shortest possible path. The result is a tree that is roughly balanced. Because operations such as inserting, deleting, and looking up a value require worst-case time proportional to the height of the tree, this theoretical upper limit on height allows red-black trees to be efficient in the worst case, unlike ordinary binary search trees.
  • It is Property 4 that ensures this result because there cannot be two consecutive red nodes on the path. The shortest possible path has all black nodes, and the longest possible path has alternating red and black nodes. Since by Property 5 all longest paths have the same number of black nodes, this shows that no path can be more than twice as long as any other path.
  • Because the red-black tree is a specialized binary search tree, the read-only operations on the red-black tree are the same as those on the ordinary binary search tree.

B-Tree

  • Leaf nodes have the same depth, and the leaf node pointer is empty
  • All elements are unique
  • The data indexes in the nodes are arranged in ascending order from left to right.

B-tree data structure.png

B+Tree

  • Non-leaf nodes do not store data, only indexes (redundancy), and can store more indexes
  • Leaf nodes contain all index fields
  • Leaf nodes are linked with pointers to improve the performance of interval access (which can improve the efficiency of range search)

B+ tree data structure.png

Key words: Order within nodes, leaf node pointer links, non-leaf node storage index (redundant)

Query the size of the data page of the mysql index:

mysql> show global status like 'Innodb_page_size';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+

Why set 16kb?

Hash

  • A hash calculation of the index key can locate the location of the data storage.
  • Many times, hash indexes are more efficient than B+ tree indexes.
  • Only "=" is supported. "in" does not support range queries.
  • There is a hash conflict problem


Hash data structure.png

index

InnoDB Index Implementation (Clustering)

The table data file itself is an index structure file organized by B+Tree

Clustered index - leaf nodes contain complete data records

Why does an InnoDb table have to have a primary key, and is it recommended to use an integer auto-increment primary key?

  • If no index is set, MySQL will select a column with unique data as the primary key index if no such column is found. What I would do is create a hidden column like rowid.
  • The table data file is maintained according to the B+Tree data structure, and the leaf node maintains the data of the row. So there must be a primary key.
  • Integers are more convenient for B+Tree sorting. If they are self-increasing, the data structure can be stored faster and sequentially without the need for a large number of tree balancing operations.

Why do leaf nodes of non-primary key index structures store primary key values?

  • Consistency: let the primary key index succeed first, then update the non-primary key index relationship
  • Save storage space.

Primary key index diagram:


InnoDB index implementation.png

Non-primary key index diagram picture

If the query is based on name = Alice:

  1. Use the non-primary key index to query and get the information (Alice, 18) after the query. In fact, this is also a non-clustered index
  2. Then perform a back-table query and query the table again using the primary key.

Two data files:

.frm mainly stores table structure information

.ibd mainly stores indexes and data

MyISAM index files (nonclustered)

Index files and data files are separate (non-clustered)


MyISAM storage engine index.png

Three data files:

.frm data structure file

.myd files are mainly used to store data

.myi files mainly store index information

Clustered and non-clustered indexes

feature:

Clustering/non-clustering mainly refers to whether the index file is together with the data file.

In terms of query efficiency, clustered indexes will not query across files, which will be faster.

Joint/Compound Indexes

Multiple fields are organized into a common index


Combined index.png

Why is the leftmost prefix principle used in this way?

The indexed data is sorted and cannot be used if fields are skipped.

Example:

where name = 'Jeff' and age = 22 -- hits the index where age = 30 and postatin='manager' -- does not hit the index where postation = 'dev' -- does not hit the index

References

Baidu Encyclopedia

Summarize

This is the end of this article about MySQL index data structure. For more relevant MySQL index data structure content, please search 123WORDPRESS.COM’s previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed analysis of MySQL index structure
  • Detailed analysis of MySQL index transactions
  • Analysis of the Principle of MySQL Index Length Limit
  • In-depth analysis of MySQL indexes
  • Analyzing the role of MySQL indexes

<<:  A brief introduction to the general process of web front-end web development

>>:  Sample code for easily implementing page layout using flex layout

Recommend

Detailed analysis of the chmod command to modify file permissions under Linux

Use the Linux chmod command to control who can ac...

Detailed explanation of Linux server status and performance related commands

Server Status Analysis View Linux server CPU deta...

A brief discussion on the pitfalls of react useEffect closure

Problem code Look at a closure problem code cause...

Detailed explanation of MySQL DEFINER usage

Table of contents Preface: 1.Brief introduction t...

Detailed explanation of several solutions for JavaScript interruption requests

Table of contents 1 Promise Interrupt the call ch...

Steps to modify the MySQL database data file path under Linux

After installing the MySQL database using the rpm...

JavaScript implements bidirectional linked list process analysis

Table of contents 1. What is a doubly linked list...

Summary of data interaction between Docker container and host

Preface When using Docker in a production environ...

Key issues and solutions for web page access speed

<br /> The website access speed can directly...

An article to deal with Mysql date and time functions

Table of contents Preface 1. Get the current time...

React hooks introductory tutorial

State Hooks Examples: import { useState } from &#...

Implementation of vertical centering with unknown height in CSS

This article mainly introduces the implementation...