OverviewThe following are common scenarios where indexes need to be created. For comparison, a test table is created (a has an index, d has no index): mysql> create table test( --Create test table-> id int(10) not null AUTO_INCREMENT, -> a int(10) default null, -> b int(10) default null, -> c int(10) default null, -> d int(10) default null, -> primary key(id), --primary key index-> key idx_a(a), --auxiliary index-> key idx_b_c(b,c) --joint index-> )engine=InnoDB charset=utf8mb4; Query OK, 0 rows affected, 5 warnings (0.09 sec) mysql> drop procedure if exists insert_test_data; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> delimiter | --Create a stored procedure to insert 100,000 data mysql> create procedure insert_test_data() -> begin -> declare i int; -> set i=1; -> while(i<=100000) do -> insert into test(a,b,c,d)values(i,i,i,i); -> set i=i+1; -> end while; -> end | Query OK, 0 rows affected (0.11 sec) mysql> delimiter; mysql> call insert_test_data(); --Execute stored procedure Query OK, 1 row affected (11 min 44.13 sec) Add indexes to conditional fields when retrieving data Aggregate functions add indexes to aggregate fields Add an index to the sort field To prevent adding indexes back to the table Add indexes to related fields in related queries It can be seen that after using the index, the query speed optimization is greatly improved. This article will understand MySQL index from the bottom up to practice. From Binary Tree to B+ TreeBinary Tree: A binary tree is a tree data structure with at most two child nodes. A node without a parent node is called a root node, and a node without a child node is called a leaf node. A binary search tree is a tree where the left child of any node is smaller than the key value of the current node, and the right child is larger than the key value of the current node. As shown in the binary search tree below, we only need ⌈ log ( n ) ⌉ ⌈log(n)⌉ ⌈log(n)⌉, that is, three times at most to match the data, while linear search will require nnn times in the worst case. However, the binary tree may degenerate into a linked list, as shown in the figure below, which is equivalent to a full scan, resulting in low efficiency. In order to solve this problem, it is necessary to ensure that the binary tree is always balanced, that is, a balanced binary tree. Balanced binary tree: A balanced binary tree (AVL tree) requires that the height difference between the left and right subtrees of each node cannot exceed 1 on the basis of satisfying the characteristics of a binary tree. It ensures a balance in the tree structure. When inserting or deleting data causes imbalance, node adjustments will be made to maintain balance (the specific algorithm is omitted) to ensure search efficiency. A node in a balanced binary tree corresponds to a key value and data. Every time we look up data, we need to read a node from the disk, which is what we call a disk block. One node corresponds to one disk block. When storing massive amounts of data, the tree will have a large number of nodes, and many disk I/Os will be performed, but the search efficiency is still extremely low. This requires a balanced tree where a single node can store multiple key values and data. B-tree: B-tree (Blance Tree) is a balanced tree that can store multiple key values and data in a single node. Each node is called a page, which is a page of data. Each node stores more key values and data, placing both key values and data in one page, and each node has more child nodes. The number of child nodes is generally called the order. The number of times the B-tree reads the disk when searching for data is greatly reduced, and the search efficiency is much higher than that of AVL. In the 3rd-order B-tree shown in the figure below, search for data with id=42. First, determine in the first page that the key value 42 is greater than 39, find the 4th page according to pointer P3, and then compare it. It is less than the key value 45. Then find the 9th page according to pointer P1 and find that there is a matching key value 42, that is, the corresponding data is found. B+ Tree: B+ tree is a further optimization of B tree. Simply put, the non-leaf nodes of the B+ tree do not store data, but only key values. The reason for this is that the page size in the database is fixed (InnoDB defaults to 16KB). If data is not stored, more key values can be stored, the larger the number of nodes, and the number of disk I/O times for searching data is further reduced. In addition, the order of the B+ tree is equal to the number of its key values. If a node stores 1,000 key values, then only three layers are needed to store 1 billion data. Therefore, generally, only two disk I/Os are required to search for 1 billion data (great). At the same time, the data of B+ tree leaf nodes are arranged in order, so B+ tree is suitable for range search, sorted search and grouped search (the data of B is scattered on the nodes, which is relatively difficult), which is why MySQL uses B+ tree index. Clustered IndexA clustered index is a type of index that reorganizes the actual data on disk and sorts it by the values of one or more specified columns. The physical order of data rows is the same as the logical order of column values (usually the primary key column). There can only be one clustered index in a table (because it can only be stored in one physical order). That is to say, we store data in a B+ tree through InnoDB, and the key value in the B+ tree is the primary key. Then the leaf node in the B+ tree stores all the data in the table (that is, the entire row of data corresponding to the primary key). The data file and the index file are the same file. If the index is found, the data is found, so we call it a clustered index. Clustered indexes are expensive to update. When inserting a new row or updating the primary key, each updated row will be forced to move to a new location (because it is sorted by the primary key). Moving rows may also face page split problems (that is, the page is full). The storage engine will split the page into two pages to accommodate it, and page splits will take up more disk space. That is, the index is rearranged, resulting in a waste of resources. Clustered indexes are suitable for range queries. Clustered index queries are very fast and are particularly suitable for range checks (between, <, <=, >, >=) or group by and order by queries. Because after the clustered index finds the row containing the first value, the rows of subsequent index values are physically connected without further searching, avoiding large-range scans and greatly improving query speed. For example, to query data with id>=19 and id<30: usually the root node resides in the memory (that is, page 1 is already in the memory), first find the key value 19 and its corresponding pointer P2 on page 1, read page 3 through P2 (at this time page 3 is not in the memory and needs to be loaded from the disk), then find the pointer P1 of the key value 19 on page 3, and locate page 8 (also loaded from the disk to the memory). Because the data is linked sequentially in a linked list, the data corresponding to the key value 19 can be found through binary search. After finding the key value 19, because it is a range search, we can query the linked list in the leaf node, traverse and match the conditions that are met in sequence, and continue to find the key value 21. The last data still cannot meet our requirements. At this time, we will use the pointer P of page 8 to read the data of page 9. Page 9 is not in the memory and also needs to be loaded from the disk and read into the memory. Then, it will be deduced from time to time until the key value 34 is matched and the conditions are not met, then it will terminate. This is a method of finding data through a clustered index. Nonclustered index A non-clustered index or non-clustered index (Secondary Index) is a B+ tree index built with columns other than the primary key as key values. The logical order of the index in the index is different from the physical storage order of the rows on the disk. A table can have multiple non-clustered indexes. In For example, define a data table test, which is composed of
When retrieving data, first search in the index tree A table can have more than one non-clustered index. In fact, each table can have up to 249 non-clustered indexes. However, each time a new index is created for a field, the data in the field will be copied to generate the index. Therefore, adding an index to a table will increase the size of the table and take up a lot of disk space and memory. So if disk space and memory are limited, the number of non-clustered indexes should be limited. In addition, every time you change the data in a table with a nonclustered index, the index must be updated at the same time, so the nonclustered index will slow down the insertion and update speed. For example, to search for data 36, two numbers are used. The first number 36 represents the key value of the index, and the second number 64 represents the primary key of the data. So after we find 36, we don’t get the data, we still have to go to the clustered index table to find the data based on its corresponding primary key. Joint index and covering indexA joint index, as the name implies, is an index that combines multiple columns on a table. When creating a joint index, the most frequently used columns will be placed on the left according to business needs, because MySQL index queries follow the principle of leftmost prefix matching. The leftmost prefix matching principle means that the leftmost is first. When retrieving data, the matching starts from the leftmost of the joint index. So when we create a joint index, such as (a, b, c), it is equivalent to creating three indexes: (a), (a, b), and (a, b, c). This is the leftmost matching principle. A covering index is just a joint index specific to a specific select statement. That is to say, for a select statement, a joint index can directly obtain the query results through the index without returning to the table for query. This joint index is said to cover this select statement. It can perfectly solve the problem of non-clustered index back table query, but the premise is to pay attention to the leftmost matching principle of the index when querying. B+Tree Index VS Hash Indexprinciple:
Hash indexes are suitable for precise queries of equal values of a large number of different data, but do not support fuzzy queries or range queries, cannot be used for sorting, and do not support the leftmost matching principle of joint indexes. In addition, when there are a large number of duplicate key values, there will be hash collision problems. Normal index and unique indexDuplicate values can be written to the fields of a common index, but duplicate values cannot be written to the fields of a unique index. First introduce Insert Buffer and Change Buffer:
The unique index uses Insert Buffer because to determine whether the uniqueness constraint is violated, if everything has been read into the memory, it will be faster to directly update the memory, and there is no need to use Change Buffer. After a normal index finds the first record that meets the conditions, it continues to search for the next record until the conditions are no longer met. For a unique index, the search ends when the first record is found. However, InnoDB reads data into memory page by page, and the data that meets the requirements later may all be in the previous data pages, so the memory consumption of several more scans of ordinary indexes can be ignored. summary:
InnoDB VS MyISAM
In contrast, you can basically consider using Let's expand on why
MVCC (Multi-Version Concurrency Control) InnoDB adds two additional hidden values (creation version number and deletion version number) to each row record to implement MVCC, one to record the row data creation time, and the other to record the row data expiration/deletion time. However, InnoDB does not store the actual time when these events occurred. Instead, it only stores the system version number when these events occurred. As transactions are created and grow, each transaction records its own system version number at the beginning, and each query must check whether the version number of each row of data is the same as the transaction version number. That is to say, the creation version number of each row of data is not greater than the transaction version number to ensure that the row data exists before the transaction is created; the deletion version number of the row data is greater than the transaction version number or is undefined to ensure that the row data is not deleted before the transaction starts. Analyze index usage with explain Use the test table outlined in the previous section to test: mysql> explain select * from test where a=88888; +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+ | 1 | SIMPLE | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | NULL | +----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+ 1 row in set, 1 warning (0.03 sec) mysql> explain select b,c from test where b=88888; +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | test | NULL | ref | idx_b_c | idx_b_c | 5 | const | 1 | 100.00 | Using index | +----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> explain select * from test where a=(select a from test where a=88888); +----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+ | 1 | PRIMARY | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | Using where | | 2 | SUBQUERY | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | Using index | +----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+ 2 rows in set, 1 warning (0.00 sec) Just focus on these three columns:
type (shows which table this row of data is about)
extra (details to resolve the query)
SummarizeThis article ends here. I hope it can be helpful to you. I also hope you can pay more attention to more content on 123WORDPRESS.COM! You may also be interested in:
|
<<: The combination and difference between ENTRYPOINT and CMD in dockerfile
>>: Introduction to the use of alt and title attributes of HTML img tags
Table of contents Preface Computed properties Int...
The PHP base image used in this article is: php:7...
grammar background: linear-gradient(direction,col...
This article shares the specific code of JavaScri...
1. If MySQL is not started successfully, check th...
Since Uniapp does not have DingTalk authorization...
MySQL can be set when it is installed, but it see...
1. How to create a user and password 1. Enter the...
Related article: Beginners learn some HTML tags (1...
Table of contents The first The second Native Js ...
Table of contents Set is a special collection who...
Limit input box to only pure numbers 1、onkeyup = ...
Why is NULL so often used? (1) Java's null Nu...
This article shares the installation tutorial of ...
The installation of Harbor is pretty simple, but ...