Index in MySQL

Index in MySQL

Preface

Let's get straight to the point. The following mind map is what I'm going to talk about now. You can get an impression first.


  • Common index types (implementation level)
  • Index Type (Application Level)
  • Clustered and non-clustered indexes
  • Covering Index
  • Best index usage strategy

1. Common index types (implementation level)

First of all, let’s not talk about how MySQL implements indexing. Let’s talk about it later. If we are asked to design the index of the database, how should we design it?

Let’s first think about what effect the index is trying to achieve? In fact, we just want to implement a strategy for quickly finding data, so the implementation of the index is essentially a search algorithm .

But it is different from ordinary search because our data has the following characteristics:

1. The amount of data stored is very, very large

2. And it is constantly changing dynamically

Therefore, these two characteristics need to be taken into account when implementing the index. We need to find the most suitable data structure algorithm to implement the search function.

Let's take a look at the common search strategies, as shown below:

Due to the two characteristics mentioned above, we first exclude static search algorithms.

As for the search tree, we have two choices: binary tree and multi-tree:

Binary tree: If we choose a binary tree, due to the huge amount of data, the depth of the binary tree will become very large, our index tree will become a towering tree, and each query will cause a lot of disk IO.

Multi-branch tree: Multi-branch tree solves the problem of large tree depth, so should we choose B-tree or B+ tree?

B-tree from Wikipediazh.wikipedia.org/wiki/B%2Btree

B+ tree from Wikipediazh.wikipedia.org/wiki/B%2Btree


From the above figure, we can see that the leaf nodes of the B+ tree store all the index values, and the leaf nodes are interconnected in the form of linked lists, so we only need to traverse from the leftmost linked list to find all the values. The most common use is range search, but the B-tree does not meet this range search, or the implementation is particularly complex, so MySQL finally chose to use the B+ tree to implement this function.

1.1 B-Tree Index (B+ Tree)

First of all, although it is officially called B-Tree index in MySQL, it uses B+ tree data structure.

B-tree index can speed up the access to data. It does not need to scan the whole table. Instead, it searches down from the root node of the index tree layer by layer. The root node stores the index value and the pointer to the next node.

Let's take a look at how the data of a single-column index is organized.

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 key(`uid`)
);

The User table above creates an index for the uid column. So how does the storage engine manage the index when inserting uid (96-102) into the table? Look at the index tree below

1. Store all index values ​​in leaf nodes. Non-leaf node values ​​are used to locate leaf nodes containing target values ​​more quickly.

2. The values ​​of leaf nodes are ordered

3. Leaf nodes are associated in the form of linked lists

Next, let’s take a look at how the data of a multi-column (joint) index is organized.

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 key(`uid`,`name`)
);

A joint index key (uid, name) is created for the User table. In this case, its index tree is as shown in the following figure.

The characteristics are the same as single-column indexes, the difference is its sorting. If the first field is the same, it will be sorted by the second index field.

How to quickly find data using B-tree?

For the B-tree index of the InnoDb storage engine, the following steps are followed to find the row data through the index:

  • If a clustered index (primary key) is used, the leaf node contains row data and can be returned directly.
  • If a non-clustered index (normal index) is used, the primary key is stored in the leaf node, and the clustered index above is queried based on the primary key, and the data is finally returned.

For the B-tree index of the MyISAM storage engine, the following steps are followed to find the row data through the index:

  • In the leaf nodes of the MyISAM index tree, neither the primary key nor the row data is stored except the index value. Instead, a pointer to the row data is stored, and data is queried from the table file based on this pointer.

1.2 Hash index (hash table)

The hash index is implemented based on the hash table and will only take effect if all columns are exactly matched.

That is to say, if there is a hash index key (col1,col2), then only when both col1 and col2 are used each time can it take effect. Because when a hash index is generated, it is implemented by taking the hash value of all index columns based on a hash function.

As shown in the figure below, there is a hash index key (name)

When we execute mysql> select * from User where name='張三'; how can we use the hash index to quickly search?

  1. The first step is to calculate the hash value, hash(Zhang San) = 1287
  2. The second step is to locate the row number, for example, key=1287 corresponds to row number 3
  3. The third step is to find the specified row and compare the name column value to see if it is Zhang San.

2. Common index types (application level)

Primary key index

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 primary key(`uid`)
);

The primary key index is unique and is usually set with the table ID. A table can only have one primary key index, which is the difference between it and the unique index.

Unique Index

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 unique key(`name`)
);

The unique index is mainly used for unique constraints in business. The difference between it and the primary key index is that a table can have multiple unique indexes.

Single column index

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 key(`name`)
);

Indexing a field

Joint Index

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 key(`name`,`uid`)
);

Two or more fields are combined to form an index. When using it, you need to pay attention to the leftmost matching principle!

There are other less commonly used ones that I won’t introduce~

3. Clustered index and non-clustered index

What is a clustered index?

A clustered index is one in which the index and row data are stored together. That is, what is stored on the leaf node of a B+ tree is not only its index value, but also the data of a corresponding row. You will know when you look at the picture later.

A clustered index is not an index, but a way of organizing data storage! ! !

crreate table test(
 col1 int not null,
 col2 int not null,
 PRIMARY KEY(col1),
 KEY(col2)
);

As shown above, the table test has two indexes, namely the primary key col1 and the common index col2. So what is the relationship between these two indexes and clustered and non-clustered?

A clustered index and a non-clustered index (secondary index) will be generated, that is, two index trees will be organized. The primary key index generates a tree for the clustered index and a tree for the nonclustered index with col2 as the index.

InnoDb will implement the clustered index through the primary key. If there is no primary key, a unique non-empty index will be selected to implement it. If there is no unique non-empty index, a primary key is implicitly generated.

Let's take a look at how clustered indexes and non-clustered indexes distribute data on the index tree. The picture is taken from "High Performance Nysql"

The following figure shows how the data of a clustered index is organized. Col1 is the clustered index tree of the primary key index

The index column is the primary key col1

It can be seen that in addition to storing the index value column col1 (3~99~4700), the leaf node also stores the values ​​of other columns, such as column col2 (92~8~13). If there are other columns, they will also be stored. In other words, the clustered index tree stores a row of data corresponding to a certain index value on the leaf node.

The following figure shows the data organization of a non-clustered index (secondary index).

The index column is col2

Unlike clustered indexes, non-clustered indexes only store primary key values ​​in addition to index values ​​on the tree leaf nodes. The clustered index stores a row of data.

If there is a sql statement select * from test where col2=93;

The above statement will go through the index tree search process twice

1. The first step is to find the leaf node containing col2=93 from the index tree of the non-clustered index and locate the primary key 3 of the row

2. The second step is to locate the leaf node containing primary key = 3 in the clustered index based on primary key 3 and return all row data.

The above is based on the InnoDb storage engine. MyISAM does not support clustered indexes because its data files and index files are stored independently of each other. The leaf nodes of the index tree of the MyISAM storage engine do not store the primary key value, but an address or pointer to the corresponding row, and then search from the table data file, as shown in the figure below.

in conclusion:

  • Clustered Index:

Usually implemented by a primary key or a non-empty unique index, the leaf node stores a whole row of data

  • Nonclustered index:

Also known as secondary index, it is the common index we often use. The leaf node stores the index value and the primary key value.

4. Covering Index

A covering index means that the index contains all the fields that need to be queried.

create table User(
`name` varchar(50) not null,
`uid` int(4) not null,
`gender` int(2) not null,
 key(`uid`,`name`)
);

If the table User has three fields User (name, uid, gender), and has a joint index key (name, uid), then

The covering index is used when executing the following SQL query.

select name,uid from User where name in ('a','b') and uid >= 98 and uid <=100 ;

The above SQL statement uses the joint index key (name, uid), and only needs to search the name and uid fields, so a covering index is used. What are the benefits of covering indexes? First look at the following picture

The figure above is the index tree corresponding to the joint index key (name, uid). From the figure, we can see that if we only need to query the two fields (name, uid), we can get the data we need to query from the index tree. There is no need to find the index value and then locate the corresponding row data from the table data file.

Benefits of covering indexes

1. Avoid secondary query of primary key index (clustering)

2. Since there is no need to query the table (from the table data file), the load of MySQL cache is greatly improved

In short, the performance of reading data is greatly improved

5. Best Index Usage Strategy

Finally, let’s talk about the pitfall avoidance guide when using the index.

Independent columns

Independent columns do not refer to single-column indexes, but rather to the fact that the index column cannot be part of an expression or function.

select * FROM test where col1 + 1 = 100; // cannot be part of an expression select * FROM test where ABS(col1) = 100; // cannot be part of a function

Leftmost matching principle

Suppose there is a joint index key (col1,col2). Then the following query is invalid index

select * from test where col2 = 3;
select * from test where col1 like '%3';

As for the leftmost matching principle, if you think about the association between the leaf nodes of the B+ tree, you will almost understand why the leftmost matching principle is needed, because the leaf nodes of B+ are associated from left to right in the form of a linked list. When we query the index, we must either use a range query or have a clear starting index value on the left. We cannot skip or use an ambiguous query such as like '%XYZ'.

Index value cannot be null

A single-column index with a null value will cause the index to be invalid

A multi-column index will be invalid if any column has a null value

Use clustered indexes and covering indexes to greatly improve read performance

Because the required fields are already available in the index tree of the clustered index and covering index, there is no need to return to the table file for query, thus improving the query speed.

Using short indexes

If you are searching for a long string, you only need to match a prefix length, which can save a lot of index space.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • MySQL index for beginners
  • Solve MySQL deadlock routine by updating different indexes
  • Understanding MySQL deadlock routines through unique index S lock and X lock
  • Share some key interview questions about MySQL index
  • A brief talk about Mysql index and redis jump table
  • MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
  • How to add index to mysql using shell script
  • Solutions to MySQL batch insert and unique index problems
  • Guide to Efficient Use of MySQL Indexes

<<:  Tutorial on how to install htop on CentOS 8

>>:  jQuery implements ad display and hide animation

Recommend

Usage and execution process of http module in node

What is the role of http in node The responsibili...

A brief discussion on the $notify points of element

My original intention was to encapsulate the $not...

MySQL 5.7.11 zip installation and configuration method graphic tutorial

1. Download the MySQL 5.7.11 zip installation pac...

MySQL big data query optimization experience sharing (recommended)

Serious MySQL optimization! If the amount of MySQ...

Detailed tutorial on installing Anaconda3 on Ubuntu 18.04

Anaconda refers to an open source Python distribu...

A Brief Analysis of MySQL - MVCC

Version Chain In InnoDB engine tables, there are ...

How to configure Basic Auth login authentication in Nginx

Sometimes we build a file server through nginx, w...

Why does using limit in MySQL affect performance?

First, let me explain the version of MySQL: mysql...

Webpack file packaging error exception

Before webpack packaging, we must ensure that the...

Introduction to install method in Vue

Table of contents 1. Globally registered componen...

Vue implements an Input component that gets the key display shortcut key effect

I encountered a requirement to customize shortcut...

Example of using supervisor to manage nginx+tomcat containers

need: Use docker to start nginx + tomcat dual pro...

Vue uses the Element el-upload component to step on the pit

Table of contents 1. Basic Use 2. Image quantity ...