Index ModelHash Table
Ordered array: The performance in both equal value query and range query scenarios is excellent, but inserting and deleting data requires data movement, which is too costly. Therefore, it only applies to static storage engines. Binary balanced tree: The left child of each node is smaller than the parent node, and the parent node is smaller than the right child. The time complexity is O(log(N)) Multi-branch balanced tree: The index is not only stored in memory, but also written to disk. In order to minimize disk reads for a query, the query process must access as few data blocks as possible. Therefore, an "N-ary" tree is used. B+TreeB-Tree and B+Tree B-Tree B+Tree InnoDB uses the B+ tree index model. Suppose we have a table with a primary key column of ID, a field k in the table, and an index on k, as shown below:
Precautions
// Assume a data page of 16KB, a row of data of 1KB, an index pointer of 6 bytes, and an index field of bigint type (8 bytes) // Number of indexes K = 16*1024/(8+6) = 1170 // Number of records in a single leaf node N = 16/1 = 16 // Layer 3 B+ record number V = K*K*N = 21902400 MyISAM also uses B+Tree indexes. The difference is that it does not distinguish between primary key and non-primary key indexes. Both are non-clustered indexes. The leaf nodes store pointers to data files. Index selectionThe purpose of the optimizer selecting an index is to find an optimal execution plan and execute the statement at the lowest cost. In the database, the number of scanned rows is one of the factors that affect the execution cost. Fewer rows scanned means fewer disk accesses and less CPU consumption. Of course, the number of scanned rows is not the only criterion for judgment. The optimizer will also make a comprehensive judgment based on factors such as whether to use temporary tables and whether to sort. How to calculate the number of scan lines The more distinct values an index has, the more discriminative the index is. The number of distinct values in an index is called cardinality. -- View the current index base mysql> show index from test; +-------+------------+----------+--------------+--------------+--------------+-------------+------+--------+------+------------+---------+------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+--------------+--------------+-------------+------+--------+------+------------+---------+------------+ | test | 0 | PRIMARY | 1 | id | A | 100256 | NULL | NULL | | BTREE | | | | test | 1 | index_a | 1 | a | A | 98199 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+--------------+--------------+-------------+------+--------+------+------------+---------+------------+ From a performance perspective, InnoDB uses sampling statistics. By default, it selects N data pages, counts the different values on these pages, gets an average value, and then multiplies it by the number of pages of this index to get the cardinality of this index. Therefore, the two indexes above do not show the same cardinality. The data table will be continuously updated, and the index statistics will not remain fixed. Therefore, when the number of changed data rows exceeds 1/M (the default is 10 when innodb_stats_persistent=on, and 16 otherwise), it will automatically trigger a re-index statistics. mysql> show variables like '%innodb_stats_persistent%'; +--------------------------------------+-------------+ | Variable_name | Value | +--------------------------------------+-------------+ -- Whether to automatically trigger the update of statistics. When the modified data exceeds 10%, the statistics will be recalculated | innodb_stats_auto_recalc | ON | -- Controls whether delete-marked records are considered when recalculating statistics | innodb_stats_include_delete_marked | OFF | -- Statistical method for null values. When the variable is set to nulls_equal, all NULL values are considered the same | innodb_stats_method | nulls_equal | -- Whether to trigger the update of statistics when operating metadata | innodb_stats_on_metadata | OFF | --Whether statistics are stored persistently | innodb_stats_persistent | ON | -- innodb_stats_persistent=on, number of sample pages for persistent statistics sampling | innodb_stats_persistent_sample_pages | 20 | -- Deprecated, replaced by innodb_stats_transient_sample_pages | innodb_stats_sample_pages | 8 | -- Transient sample page number | innodb_stats_transient_sample_pages | 8 | +--------------------------------------+-------------+
-- Recalculate index information mysql> analyze table t; Effect of sorting on index selection -- Create a table mysql> CREATE TABLE `t` ( `id` int(11) NOT NULL, `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `b` (`b`) )ENGINE=InnoDB; -- Define the test data storage procedure mysql> delimiter; CREATE PROCEDURE idata () BEGIN DECLARE i INT ; SET i = 1 ; WHILE (i <= 100000) DO INSERT INTO t VALUES (i, i, i) ; SET i = i + 1 ; END WHILE ; END; delimiter ; -- Execute the stored procedure and insert test data mysql> CALL idata (); -- View the execution plan, using the index on field a mysql> explain select * from t where a between 10000 and 20000; +----+-------------+-------+-------+---------------+-----+----------+------+-------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+----------+------+-------+-----------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 10000 | Using index condition | +----+-------------+-------+-------+---------------+-----+----------+------+-------+-----------------------+ -- Since field b needs to be sorted, although index b needs to scan more rows, it is ordered. Considering the number of rows to be scanned and the sorting, the optimizer chooses index b, thinking that the cost is smaller.mysql> explain select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1; +----+-------------+-------+-------+---------------+-----+------+------+------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+------+------+------+------+------------------------------------+ | 1 | SIMPLE | t | range | a,b | b | 5 | NULL | 50128 | Using index condition; Using where | +----+-------------+-------+-------+---------------+-----+------+------+------+------+------------------------------------+ -- Solution 1: Use force index to force index a to correct the optimizer's incorrect choice. This is not recommended (not universal, and the statement needs to be changed if the index name is changed) mysql> explain select * from t force index(a) where (a between 1 and 1000) and (b between 50000 and 100000) order by b limit 1; +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 999 | Using index condition; Using where; Using filesort | +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ -- Solution 2: Guide MySQL to use the desired index and sort by b, a. The optimizer needs to consider the cost of sorting a. mysql> explain select * from t where (a between 1 and 1000) and (b between 50000 and 100000) order by b, a limit 1; +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ | 1 | SIMPLE | t | range | a,b | a | 5 | NULL | 999 | Using index condition; Using where; Using filesort | +----+-------------+-------+-------+---------------+-----+------+------+------+----------------------------------------------------+ -- Solution 3: In some scenarios, we can create a more appropriate index to provide to the optimizer for selection, or delete the misused index ALTER TABLE `t` DROP INDEX `a`, DROP INDEX `b`, ADD INDEX `ab` (`a`,`b`); Index optimization Index selectivityIndex selectivity = cardinality / total number of rows -- Selective index of field xxx in table t: select count(distinct xxx)/count(id) from t; The selectivity of an index refers to the ratio of unique index values (cardinality) to the number of table records. Selectivity is an indicator of the index's screening ability. The index value range is 0~1. The greater the selectivity, the greater the index value. When using a normal index query, the normal index is loaded first, the primary key of the actual row is queried through the normal index, and then the primary key is used to query the corresponding row through the clustered index, so as to query all rows in a loop. If you search the clustered index directly in its entirety, you do not need to switch back and forth between the normal index and the clustered index. Compared with the total cost of the two operations, scanning the entire table may be more efficient. In actual work, it still depends on the business situation. If the data distribution is uneven, the actual query conditions always query the part with less data. Adding an index on the column with lower index selection may also have a good effect. Covering IndexCovering indexes can reduce the number of tree searches and significantly improve query performance, so using covering indexes is a common performance optimization method. -- You only need to check the value of ID, and the value of ID is already in the k index tree, so you can directly provide the query result without going back to the table select ID from T where k between 3 and 5 -- Add field V. V needs to be returned for each query. You can consider making k and v into a joint index select ID,V from T where k between 3 and 5 Leftmost prefix principle + index pushdown-- Create a joint index on the three columns of id, name, and age. -- Satisfy the leftmost prefix principle, and both name and age are indexed. select * from T where name='xxx' and age=12 -- Mysql automatically optimizes and adjusts the order of name and age. Both name and age are indexed. select * from T where age=12 and name='xxx' -- If name satisfies the leftmost prefix principle, the index is used. MySQL 5.6 introduces index condition pushdown optimization, that is, the records that do not satisfy age=12 are first filtered out in the index and then returned to the table. select * from T where name like 'xxx%' and age=12 -- Does not meet the leftmost prefix principle, no index is used select * from T where name like '%xxx%' and age=12 -- Satisfy the leftmost prefix principle, name uses the index select * from T where name='xxx' -- Does not meet the leftmost prefix principle, does not use the index select * from T where age=12 Principles of establishing joint indexes:
Prefix Indexmysql> create table SUser( ID bigint unsigned primary key, name varchar(64), email varchar(64), ... )engine=innodb; -- The following query scenariomysql> select name from SUser where email='xxx'; -- Solution 1: Full-text index. The number of table returns is determined by the amount of data that meets the conditions. mysql> alter table SUser add index index1(email); -- Solution 2: prefix index. The number of table returns is determined by the prefix matching result. mysql> alter table SUser add index index2(email(6)); Prefix index can save space, but you need to pay attention to the definition of prefix length. While saving space, you should not increase the query cost too much, that is, reduce the number of table return verifications. How to set the appropriate prefix length? -- Preset an acceptable discrimination loss ratio and select the smallest prefix length that meets the conditions select count(distinct left(email,n))/count(distinct email) from SUser; What if the appropriate prefix length is longer? For example, if the ID card number meets the requirements for differentiation, a prefix index of more than 12 digits may be required. The space saved is limited and the query cost is increased. Therefore, there is no need to use a prefix index. At this point, we can consider using the following methods: Reverse storage -- String reversal query during query mysql> select field_list from t where id_card = reverse('input_id_card_string'); Using hash fields -- Create an integer field to store the ID card verification code and create an index on this field mysql> alter table t add id_card_crc int unsigned, add index(id_card_crc); -- Use the hash field to search by index, and then use the original field precision to filter mysql> select field_list from t where id_card_crc=crc32('input_id_card_string') and id_card='input_id_card_string' Disadvantages of the above two methods:
What is the impact of prefix index on covering index? -- If you use a prefix index, you won't need a covering index to optimize query performance. select id,email from SUser where email='xxx'; Unique IndexIt is recommended to use a normal index. The unique index cannot use the change buffer and has a low memory hit rate. Index failure
SummarizeThis is the end of this article about MySQL index selection and optimization. For more relevant MySQL index selection and optimization content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Web Design TabIndex Element
>>: Use elasticsearch to delete index data regularly
Preface tcpdump is a well-known command-line pack...
It is essentially a common js object used to desc...
1. The first method is to use the unhup command d...
Table of contents Node connects to Mysql Install ...
Table of contents 1. Compiler code format specifi...
The first one : Copy code The code is as follows: ...
Use CSS to modify the browser scroll bar style ::...
1. MyISAM storage engine shortcoming: No support ...
<br />In one year of blogging, I have person...
After spending half the night on it, I finally ma...
Tutorial Series MySQL series: Basic concepts of M...
This article uses examples to describe the common...
Table of contents 1. Instructions for use 2. Prep...
This article summarizes various ways to implement...
After the release of CentOS8.0-1905, we tried to ...