MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index

MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index

Overview

In a database, an index is used to speed up data search just like a tree directory. For an SQL query operation, the index can be used to quickly filter out data that does not meet the requirements and locate data that meets the requirements, eliminating the need to scan the entire table to obtain the required data.

In the innodb storage engine, indexing is mainly based on B+ trees. Index keywords are stored in non-leaf nodes, and data records or primary key values ​​in primary key indexes (or clustered indexes) are stored in leaf nodes. All data records are in the same layer, and leaf nodes, that is, data records, are directly connected by pointers to form a bidirectional linked list, which makes it easy to traverse all or a range of data records.

B-tree, B+tree

Both B-tree and B+-tree are multi-way balanced search trees that reduce the height of the tree by storing more keywords in each node and keeping the tree balanced through rotation and splitting operations, thereby reducing the amount of disk access for data retrieval.

A major difference between B+ tree and B-tree is that the leaf nodes of B+ are connected front and back through pointers, specifically through a doubly linked list, so it is very suitable for performing range searches. For details, please refer to:

Data Structure - Tree (III): Multi-way Search Tree B-tree, B+ tree

The clustered and non-clustered indexes of the InnoDB storage engine are implemented based on B+ trees.
Primary key index

The innodb storage engine uses the primary key index as the clustered index of the table. The characteristic of the clustered index is that the non-leaf nodes store the primary key as the search keyword, and the leaf nodes store the actual data records themselves (also called data pages). Data records are stored from left to right in the order of keywords. Therefore, the clustered index is actually the way of storing data. Therefore, each table can only have one clustered index. The data table of the innodb storage engine is also called an index-organized table. The structure is as follows: (Picture from "MySQL Technology Insider: Innodb Storage Engine")

In the query, if you search for data by primary key, that is, when the explain analysis SQL key shows PRIMARY, the search efficiency is the highest, because the leaf node stores the data record itself, so it can be returned directly without the need for additional table query (in the primary key index) to obtain the data record like a non-clustered index.

Secondly, for ORDER BY sorting operations, no matter it is ASC or DESC, if the ORDER BY column is the primary key, the B+ tree corresponding to the primary key index is ordered, so the data returned by the storage engine is already ordered according to the primary key, and there is no need to sort it at the MySQL server level, which improves performance. If the SQL is analyzed through explain, and extra displays Using filesort, it means that sorting is required at the MySQL server level. At this time, you may need to use a temporary table or external file sorting. In this case, you generally need to find a way to optimize it.

For range searches based on primary keys, since the leaf nodes of the clustered index are connected using a bidirectional linked list according to the order of the primary keys, data records in a certain range can be found quickly.

Auxiliary index

Auxiliary index, also known as secondary index, is a non-clustered index, which is generally designed to improve the efficiency of certain queries. That is, when querying using the index column, the auxiliary index is used to avoid full table scan. Since the auxiliary index is not a clustered index, each table can have multiple auxiliary indexes with the following structure:

The non-leaf nodes of the auxiliary index store the keywords of the index column, and the leaf nodes store the primary key values ​​of the corresponding clustered index (or primary key index). That is, after locating the required data through the auxiliary index, if the required columns cannot be covered by the index, that is, to obtain all the data columns required for the query through the auxiliary index column, it is necessary to locate the primary key in the clustered index through the primary key value of the corresponding clustered index, and then find the corresponding leaf page in the clustered index through the primary key value to obtain the corresponding data record. Therefore, the whole process involves two processes: first searching in the auxiliary index and then searching in the clustered index (that is, the primary key index) (back table query).

For example:

  1. The height of the B+ tree corresponding to the auxiliary index is 3, so 3 disk IOs are required to locate the leaf node, where the leaf node contains a primary key value of the corresponding clustered index;
  2. Then, the corresponding data record is found in the clustered index through the primary key value of the corresponding clustered index of the leaf node. That is, if the height of the B+ tree corresponding to the clustered index is also 3, 3 disk IOs are also required to locate the leaf page of the clustered index, so as to obtain the actual data record in the leaf page.

The above process requires a total of 6 disk IOs. Therefore, if there are many rows of data that need to be queried, the required disk IO will increase exponentially and the query performance will decrease. Therefore, it is necessary to create auxiliary indexes on columns with a high degree of filtering, that is, columns with less duplicate data.

Cardinality: The data duplication of the index column

From the above analysis, we can see that when querying through auxiliary indexes, if you need to query the table back and there are many rows of data to be queried, a large amount of disk IO is required to obtain data. Therefore, this index not only does not improve query performance, but will reduce query performance. In addition, when the MySQL optimizer needs to return many rows of data, it will also give up using the index and directly perform a full table scan. Therefore, the columns selected by the auxiliary index need to be columns with low duplication, that is, only one or two rows of data need to be returned after a general query. If there are too many duplicate values ​​in this column, you need to consider giving up on creating a secondary index on this column.

Specifically, you can use SHOW INDEX FROM to determine the Cardinality value:

mysql> SHOW INDEX FROM store_order;
+---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| store_order | 0 | PRIMARY | 1 | store_id | A | 201 | NULL | NULL | | BTREE | | |
| store_order | 1 | idx_expire | 1 | expire_date | A | 68 | NULL | NULL | YES | BTREE | | |
| store_order | 1 | idx_ul | 1 | ul | A | 22 | NULL | NULL | YES | BTREE | | |
+---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.01 sec)

Cardinality indicates the estimated number of unique values ​​in the index column. If it is close to the number of data rows, it means that there are few duplicate values ​​in the column and the column has good filtering performance. If the difference is too large, that is, the value of Cardinality / total number of data rows is too small, such as the gender column contains only two values, "male" and "female", it means that there are a large number of duplicate values ​​in the column and you need to consider whether to delete the index.

Covering Index

  1. Since the overhead of table return query is large, in order to reduce the number of table return queries, all the columns required for the query can be added to the auxiliary index, such as using a joint index. In this way, all the data required for the query can be obtained from the auxiliary index (because the leaf page of the auxiliary index contains the primary key value, even if the index does not have the primary key value, if only the primary key value and index column need to be returned, a covering index will be used). There is no need to return to the table to query the complete data row, thereby improving performance. This mechanism is called a covering index.
  2. When using explain to analyze query SQL, if extra displays using index, it means that a covering index is used to return data, and the query performance is high.
  3. Since the existence of indexes will increase the overhead of updating data, that is, when updating data, such as adding and deleting data rows, it is necessary to update the corresponding auxiliary indexes. Therefore, a compromise between the two needs to be made in the specific design.

The joint index matches the leftmost preceding stamp

  1. A joint index uses multiple columns as indexes, such as (a,b,c), which means that columns a, b, and c are used as indexes. According to the characteristics of the B+ tree, the indexes must match the leftmost forward sigma, so it is actually equivalent to establishing three indexes: a, (a,b), and (a,b,c).
  2. Therefore, when designing a joint index, in addition to considering whether it can be optimized into a covering index, you also need to consider the order of multiple columns. The general experience is that the column with the highest query frequency and the best filtering ability (fewer duplicate values) is placed in front, that is, on the left.

Combined index optimization sort order by

In addition, you can consider using a joint index to reduce the sorting at the MySQL server level. For example, the user order table contains a joint index (user_id, buy_date) and a single column index (user_id): (Note that this is only for demonstration of the joint index. In actual projects, only a joint index is needed. As mentioned above, (a,b) is equivalent to two indexes a and (a,b)):

KEY `idx_user_id` (`user_id`),
KEY `idx_user_id_buy_date` (`user_id`,`buy_date`)

If you just query a user's orders, InnoDB will use the user_id index, as follows:

mysql> explain select user_id, order_id from t_order where user_id = 1;
+----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | t_order | NULL | ref | idx_user_id,idx_user_id_buy_date | idx_user_id | 4 | const | 4 | 100.00 | Using index |
+----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

However, when you need to sort based on the purchase date buy_date and retrieve the purchase records of the user in the last three days, both the single-column index user_id and the joint index (user_id, buy_date) can be used. InnoDB will choose to use the joint index because buy_date is already sorted in the joint index, so there is no need to sort again at the MySQL server level, thereby improving performance, as follows:

mysql> explain select user_id, order_id from t_order where user_id = 1 order by buy_date limit 3;
+----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+
| 1 | SIMPLE | t_order | NULL | ref | idx_user_id,idx_user_id_buy_date | idx_user_id_buy_date | 4 | const | 4 | 100.00 | Using where; Using index |
+----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+
1 row in set, 1 warning (0.01 sec)

If the joint index idx_user_id_buy_date is deleted, Using filesort is displayed:

mysql> alter table t_order drop index idx_user_id_buy_date;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> explain select user_id, order_id from t_order where user_id = 1 order by buy_date limit 3;
+----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | t_order | NULL | ALL | idx_user_id | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort |
+----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)

The above is a detailed explanation and integration of the implementation of the Innodb storage engine index introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

You may also be interested in:
  • Detailed explanation of the my.ini Chinese configuration scheme for MySql optimization: InnoDB, 4GB memory, and multiple queries
  • Briefly describe the MySQL InnoDB storage engine
  • MySQL learning summary: a preliminary understanding of the architectural design of the InnoDB storage engine
  • Summary of the differences between MySQL storage engines MyISAM and InnoDB
  • A Deep Dive into the MySQL InnoDB Storage Engine
  • Detailed analysis of MySQL 8.0 memory consumption
  • Detailed explanation of the usage of MySQL memory tables and temporary tables
  • Summary of MySQL 8.0 memory-related parameters
  • Detailed explanation of how to reduce memory usage in MySql
  • Detailed explanation of memory management of MySQL InnoDB storage engine

<<:  Install tomcat and deploy the website under Linux (recommended)

>>:  How to extend Vue Router links in Vue 3

Recommend

How to configure whitelist access in mysql

Steps to configure whitelist access in mysql 1. L...

Summary of changes in the use of axios in vue3 study notes

Table of contents 1. Basic use of axio 2. How to ...

Vue3 list interface data display details

Table of contents 1. List interface display examp...

Summary and practice of javascript prototype chain diagram

Table of contents Prototype chain We can implemen...

HTML realizes real-time monitoring function of Hikvision camera

Recently the company has arranged to do some CCFA...

Difference between HTML ReadOnly and Enabled

The TextBox with the ReadOnly attribute will be di...

Detailed steps for installing and configuring MySQL 5.7

1. Download MySQL 1. Log in to the official websi...

How to use Vuex's auxiliary functions

Table of contents mapState mapGetters mapMutation...

Implementation steps of vue-element-admin to build a backend management system

Recently, when I was working on a conference heal...

Example of using CSS3 to customize the style of input multiple-select box

Principle: First hide the input element, then use...