Preface Let's get straight to the point. The following mind map is what I'm going to talk about now. You can get an impression first.
1. Common index types (implementation level) First of all, let’s not talk about how MySQL implements indexing. Let’s talk about it later. If we are asked to design the index of the database, how should we design it? Let’s first think about what effect the index is trying to achieve? In fact, we just want to implement a strategy for quickly finding data, so the implementation of the index is essentially a search algorithm . But it is different from ordinary search because our data has the following characteristics: 1. The amount of data stored is very, very large 2. And it is constantly changing dynamically Therefore, these two characteristics need to be taken into account when implementing the index. We need to find the most suitable data structure algorithm to implement the search function. Let's take a look at the common search strategies, as shown below: Due to the two characteristics mentioned above, we first exclude static search algorithms. As for the search tree, we have two choices: binary tree and multi-tree: Binary tree: If we choose a binary tree, due to the huge amount of data, the depth of the binary tree will become very large, our index tree will become a towering tree, and each query will cause a lot of disk IO. Multi-branch tree: Multi-branch tree solves the problem of large tree depth, so should we choose B-tree or B+ tree? B-tree from Wikipediazh.wikipedia.org/wiki/B%2Btree B+ tree from Wikipediazh.wikipedia.org/wiki/B%2Btree From the above figure, we can see that the leaf nodes of the B+ tree store all the index values, and the leaf nodes are interconnected in the form of linked lists, so we only need to traverse from the leftmost linked list to find all the values. The most common use is range search, but the B-tree does not meet this range search, or the implementation is particularly complex, so MySQL finally chose to use the B+ tree to implement this function. 1.1 B-Tree Index (B+ Tree) First of all, although it is officially called B-Tree index in MySQL, it uses B+ tree data structure. B-tree index can speed up the access to data. It does not need to scan the whole table. Instead, it searches down from the root node of the index tree layer by layer. The root node stores the index value and the pointer to the next node. Let's take a look at how the data of a single-column index is organized. create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, key(`uid`) ); The User table above creates an index for the uid column. So how does the storage engine manage the index when inserting uid (96-102) into the table? Look at the index tree below 1. Store all index values in leaf nodes. Non-leaf node values are used to locate leaf nodes containing target values more quickly. 2. The values of leaf nodes are ordered 3. Leaf nodes are associated in the form of linked lists Next, let’s take a look at how the data of a multi-column (joint) index is organized. create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, key(`uid`,`name`) ); A joint index key (uid, name) is created for the User table. In this case, its index tree is as shown in the following figure. The characteristics are the same as single-column indexes, the difference is its sorting. If the first field is the same, it will be sorted by the second index field. How to quickly find data using B-tree? For the B-tree index of the InnoDb storage engine, the following steps are followed to find the row data through the index:
For the B-tree index of the MyISAM storage engine, the following steps are followed to find the row data through the index:
1.2 Hash index (hash table) The hash index is implemented based on the hash table and will only take effect if all columns are exactly matched. That is to say, if there is a hash index key (col1,col2), then only when both col1 and col2 are used each time can it take effect. Because when a hash index is generated, it is implemented by taking the hash value of all index columns based on a hash function. As shown in the figure below, there is a hash index key (name) When we execute
2. Common index types (application level) Primary key index create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, primary key(`uid`) ); The primary key index is unique and is usually set with the table ID. A table can only have one primary key index, which is the difference between it and the unique index. Unique Index create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, unique key(`name`) ); The unique index is mainly used for unique constraints in business. The difference between it and the primary key index is that a table can have multiple unique indexes. Single column index create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, key(`name`) ); Indexing a field Joint Index create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, key(`name`,`uid`) ); Two or more fields are combined to form an index. When using it, you need to pay attention to the leftmost matching principle! There are other less commonly used ones that I won’t introduce~ 3. Clustered index and non-clustered index What is a clustered index? A clustered index is one in which the index and row data are stored together. That is, what is stored on the leaf node of a B+ tree is not only its index value, but also the data of a corresponding row. You will know when you look at the picture later. A clustered index is not an index, but a way of organizing data storage! ! ! crreate table test( col1 int not null, col2 int not null, PRIMARY KEY(col1), KEY(col2) ); As shown above, the table test has two indexes, namely the primary key col1 and the common index col2. So what is the relationship between these two indexes and clustered and non-clustered? A clustered index and a non-clustered index (secondary index) will be generated, that is, two index trees will be organized. The primary key index generates a tree for the clustered index and a tree for the nonclustered index with col2 as the index. InnoDb will implement the clustered index through the primary key. If there is no primary key, a unique non-empty index will be selected to implement it. If there is no unique non-empty index, a primary key is implicitly generated. Let's take a look at how clustered indexes and non-clustered indexes distribute data on the index tree. The picture is taken from "High Performance Nysql" The following figure shows how the data of a clustered index is organized. Col1 is the clustered index tree of the primary key index The index column is the primary key col1 It can be seen that in addition to storing the index value column col1 (3~99~4700), the leaf node also stores the values of other columns, such as column col2 (92~8~13). If there are other columns, they will also be stored. In other words, the clustered index tree stores a row of data corresponding to a certain index value on the leaf node. The following figure shows the data organization of a non-clustered index (secondary index). The index column is col2 Unlike clustered indexes, non-clustered indexes only store primary key values in addition to index values on the tree leaf nodes. The clustered index stores a row of data. If there is a sql statement The above statement will go through the index tree search process twice 1. The first step is to find the leaf node containing col2=93 from the index tree of the non-clustered index and locate the primary key 3 of the row 2. The second step is to locate the leaf node containing primary key = 3 in the clustered index based on primary key 3 and return all row data. The above is based on the InnoDb storage engine. MyISAM does not support clustered indexes because its data files and index files are stored independently of each other. The leaf nodes of the index tree of the MyISAM storage engine do not store the primary key value, but an address or pointer to the corresponding row, and then search from the table data file, as shown in the figure below. in conclusion:
Usually implemented by a primary key or a non-empty unique index, the leaf node stores a whole row of data
Also known as secondary index, it is the common index we often use. The leaf node stores the index value and the primary key value. 4. Covering Index A covering index means that the index contains all the fields that need to be queried. create table User( `name` varchar(50) not null, `uid` int(4) not null, `gender` int(2) not null, key(`uid`,`name`) ); If the table User has three fields User (name, uid, gender), and has a joint index key (name, uid), then The covering index is used when executing the following SQL query. select name,uid from User where name in ('a','b') and uid >= 98 and uid <=100 ; The above SQL statement uses the joint index key (name, uid), and only needs to search the name and uid fields, so a covering index is used. What are the benefits of covering indexes? First look at the following picture The figure above is the index tree corresponding to the joint index key (name, uid). From the figure, we can see that if we only need to query the two fields (name, uid), we can get the data we need to query from the index tree. There is no need to find the index value and then locate the corresponding row data from the table data file. Benefits of covering indexes 1. Avoid secondary query of primary key index (clustering) 2. Since there is no need to query the table (from the table data file), the load of MySQL cache is greatly improved In short, the performance of reading data is greatly improved 5. Best Index Usage Strategy Finally, let’s talk about the pitfall avoidance guide when using the index. Independent columns Independent columns do not refer to single-column indexes, but rather to the fact that the index column cannot be part of an expression or function. select * FROM test where col1 + 1 = 100; // cannot be part of an expression select * FROM test where ABS(col1) = 100; // cannot be part of a function Leftmost matching principle Suppose there is a joint index key (col1,col2). Then the following query is invalid index select * from test where col2 = 3; select * from test where col1 like '%3'; As for the leftmost matching principle, if you think about the association between the leaf nodes of the B+ tree, you will almost understand why the leftmost matching principle is needed, because the leaf nodes of B+ are associated from left to right in the form of a linked list. When we query the index, we must either use a range query or have a clear starting index value on the left. We cannot skip or use an ambiguous query such as like '%XYZ'. Index value cannot be null A single-column index with a null value will cause the index to be invalid A multi-column index will be invalid if any column has a null value Use clustered indexes and covering indexes to greatly improve read performance Because the required fields are already available in the index tree of the clustered index and covering index, there is no need to return to the table file for query, thus improving the query speed. Using short indexes If you are searching for a long string, you only need to match a prefix length, which can save a lot of index space. Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. You may also be interested in:
|
<<: Tutorial on how to install htop on CentOS 8
>>: jQuery implements ad display and hide animation
What is the role of http in node The responsibili...
My original intention was to encapsulate the $not...
1. Download the MySQL 5.7.11 zip installation pac...
Serious MySQL optimization! If the amount of MySQ...
Anaconda refers to an open source Python distribu...
Version Chain In InnoDB engine tables, there are ...
Preface I need to add a synchronized scrolling fe...
Sometimes we build a file server through nginx, w...
First, let me explain the version of MySQL: mysql...
Before webpack packaging, we must ensure that the...
Table of contents 1. Globally registered componen...
1 Requirements Overview The data of multiple tabl...
I encountered a requirement to customize shortcut...
need: Use docker to start nginx + tomcat dual pro...
Table of contents 1. Basic Use 2. Image quantity ...