Overview In a database, an index is used to speed up data search just like a tree directory. For an SQL query operation, the index can be used to quickly filter out data that does not meet the requirements and locate data that meets the requirements, eliminating the need to scan the entire table to obtain the required data. In the innodb storage engine, indexing is mainly based on B+ trees. Index keywords are stored in non-leaf nodes, and data records or primary key values in primary key indexes (or clustered indexes) are stored in leaf nodes. All data records are in the same layer, and leaf nodes, that is, data records, are directly connected by pointers to form a bidirectional linked list, which makes it easy to traverse all or a range of data records. B-tree, B+tree Both B-tree and B+-tree are multi-way balanced search trees that reduce the height of the tree by storing more keywords in each node and keeping the tree balanced through rotation and splitting operations, thereby reducing the amount of disk access for data retrieval. A major difference between B+ tree and B-tree is that the leaf nodes of B+ are connected front and back through pointers, specifically through a doubly linked list, so it is very suitable for performing range searches. For details, please refer to: Data Structure - Tree (III): Multi-way Search Tree B-tree, B+ tree The clustered and non-clustered indexes of the InnoDB storage engine are implemented based on B+ trees. The innodb storage engine uses the primary key index as the clustered index of the table. The characteristic of the clustered index is that the non-leaf nodes store the primary key as the search keyword, and the leaf nodes store the actual data records themselves (also called data pages). Data records are stored from left to right in the order of keywords. Therefore, the clustered index is actually the way of storing data. Therefore, each table can only have one clustered index. The data table of the innodb storage engine is also called an index-organized table. The structure is as follows: (Picture from "MySQL Technology Insider: Innodb Storage Engine") In the query, if you search for data by primary key, that is, when the explain analysis SQL key shows PRIMARY, the search efficiency is the highest, because the leaf node stores the data record itself, so it can be returned directly without the need for additional table query (in the primary key index) to obtain the data record like a non-clustered index. Secondly, for ORDER BY sorting operations, no matter it is ASC or DESC, if the ORDER BY column is the primary key, the B+ tree corresponding to the primary key index is ordered, so the data returned by the storage engine is already ordered according to the primary key, and there is no need to sort it at the MySQL server level, which improves performance. If the SQL is analyzed through explain, and extra displays Using filesort, it means that sorting is required at the MySQL server level. At this time, you may need to use a temporary table or external file sorting. In this case, you generally need to find a way to optimize it. For range searches based on primary keys, since the leaf nodes of the clustered index are connected using a bidirectional linked list according to the order of the primary keys, data records in a certain range can be found quickly. Auxiliary index Auxiliary index, also known as secondary index, is a non-clustered index, which is generally designed to improve the efficiency of certain queries. That is, when querying using the index column, the auxiliary index is used to avoid full table scan. Since the auxiliary index is not a clustered index, each table can have multiple auxiliary indexes with the following structure: The non-leaf nodes of the auxiliary index store the keywords of the index column, and the leaf nodes store the primary key values of the corresponding clustered index (or primary key index). That is, after locating the required data through the auxiliary index, if the required columns cannot be covered by the index, that is, to obtain all the data columns required for the query through the auxiliary index column, it is necessary to locate the primary key in the clustered index through the primary key value of the corresponding clustered index, and then find the corresponding leaf page in the clustered index through the primary key value to obtain the corresponding data record. Therefore, the whole process involves two processes: first searching in the auxiliary index and then searching in the clustered index (that is, the primary key index) (back table query). For example:
The above process requires a total of 6 disk IOs. Therefore, if there are many rows of data that need to be queried, the required disk IO will increase exponentially and the query performance will decrease. Therefore, it is necessary to create auxiliary indexes on columns with a high degree of filtering, that is, columns with less duplicate data. Cardinality: The data duplication of the index column From the above analysis, we can see that when querying through auxiliary indexes, if you need to query the table back and there are many rows of data to be queried, a large amount of disk IO is required to obtain data. Therefore, this index not only does not improve query performance, but will reduce query performance. In addition, when the MySQL optimizer needs to return many rows of data, it will also give up using the index and directly perform a full table scan. Therefore, the columns selected by the auxiliary index need to be columns with low duplication, that is, only one or two rows of data need to be returned after a general query. If there are too many duplicate values in this column, you need to consider giving up on creating a secondary index on this column. Specifically, you can use SHOW INDEX FROM to determine the Cardinality value: mysql> SHOW INDEX FROM store_order; +---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | store_order | 0 | PRIMARY | 1 | store_id | A | 201 | NULL | NULL | | BTREE | | | | store_order | 1 | idx_expire | 1 | expire_date | A | 68 | NULL | NULL | YES | BTREE | | | | store_order | 1 | idx_ul | 1 | ul | A | 22 | NULL | NULL | YES | BTREE | | | +---------------+------------+------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.01 sec) Cardinality indicates the estimated number of unique values in the index column. If it is close to the number of data rows, it means that there are few duplicate values in the column and the column has good filtering performance. If the difference is too large, that is, the value of Cardinality / total number of data rows is too small, such as the gender column contains only two values, "male" and "female", it means that there are a large number of duplicate values in the column and you need to consider whether to delete the index. Covering Index
The joint index matches the leftmost preceding stamp
Combined index optimization sort order by In addition, you can consider using a joint index to reduce the sorting at the MySQL server level. For example, the user order table contains a joint index (user_id, buy_date) and a single column index (user_id): (Note that this is only for demonstration of the joint index. In actual projects, only a joint index is needed. As mentioned above, (a,b) is equivalent to two indexes a and (a,b)): KEY `idx_user_id` (`user_id`), KEY `idx_user_id_buy_date` (`user_id`,`buy_date`) If you just query a user's orders, InnoDB will use the user_id index, as follows: mysql> explain select user_id, order_id from t_order where user_id = 1; +----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | t_order | NULL | ref | idx_user_id,idx_user_id_buy_date | idx_user_id | 4 | const | 4 | 100.00 | Using index | +----+-------------+---------+------------+------+----------------------------------+-------------+---------+-------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) However, when you need to sort based on the purchase date buy_date and retrieve the purchase records of the user in the last three days, both the single-column index user_id and the joint index (user_id, buy_date) can be used. InnoDB will choose to use the joint index because buy_date is already sorted in the joint index, so there is no need to sort again at the MySQL server level, thereby improving performance, as follows: mysql> explain select user_id, order_id from t_order where user_id = 1 order by buy_date limit 3; +----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+ | 1 | SIMPLE | t_order | NULL | ref | idx_user_id,idx_user_id_buy_date | idx_user_id_buy_date | 4 | const | 4 | 100.00 | Using where; Using index | +----+-------------+---------+------------+------+----------------------------------+----------------------+----------+-------+------+----------+--------------------------+ 1 row in set, 1 warning (0.01 sec) If the joint index idx_user_id_buy_date is deleted, Using filesort is displayed: mysql> alter table t_order drop index idx_user_id_buy_date; Query OK, 0 rows affected (0.02 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain select user_id, order_id from t_order where user_id = 1 order by buy_date limit 3; +----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+ | 1 | SIMPLE | t_order | NULL | ALL | idx_user_id | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort | +----+-------------+---------+------------+------+---------------+-----+---------+------+------+----------+-----------------------------+ 1 row in set, 1 warning (0.00 sec) The above is a detailed explanation and integration of the implementation of the Innodb storage engine index introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website! You may also be interested in:
|
<<: Install tomcat and deploy the website under Linux (recommended)
>>: How to extend Vue Router links in Vue 3
Steps to configure whitelist access in mysql 1. L...
This article uses examples to illustrate the tabl...
Table of contents 1. Basic use of axio 2. How to ...
Overview Databases generally execute multiple tra...
1. Application Scenarios Parent page a.jsp Subpage...
Table of contents 1. List interface display examp...
Table of contents Prototype chain We can implemen...
Recently the company has arranged to do some CCFA...
The TextBox with the ReadOnly attribute will be di...
When the server needs to be started during develo...
Table of contents Preface Child components pass d...
1. Download MySQL 1. Log in to the official websi...
Table of contents mapState mapGetters mapMutation...
Recently, when I was working on a conference heal...
Principle: First hide the input element, then use...