What is a MySQL index? Ask if you don't understand

What is a MySQL index? Ask if you don't understand

Overview

The following are common scenarios where indexes need to be created. For comparison, a test table is created (a has an index, d has no index):

mysql> create table test( --Create test table-> id int(10) not null AUTO_INCREMENT,
    -> a int(10) default null,
    -> b int(10) default null,
    -> c int(10) default null,
    -> d int(10) default null,
    -> primary key(id), --primary key index-> ​​key idx_a(a), --auxiliary index-> ​​key idx_b_c(b,c) --joint index-> ​​)engine=InnoDB charset=utf8mb4;
Query OK, 0 rows affected, 5 warnings (0.09 sec)
mysql> drop procedure if exists insert_test_data;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> delimiter | --Create a stored procedure to insert 100,000 data mysql> create procedure insert_test_data()
    -> begin
    -> declare i int;
    -> set i=1;
    -> while(i<=100000) do
    -> insert into test(a,b,c,d)values(i,i,i,i);
    -> set i=i+1;
    -> end while;
    -> end |
Query OK, 0 rows affected (0.11 sec)
mysql> delimiter;
mysql> call insert_test_data(); --Execute stored procedure Query OK, 1 row affected (11 min 44.13 sec)

Add indexes to conditional fields when retrieving data

insert image description here

Aggregate functions add indexes to aggregate fields

insert image description here

Add an index to the sort field

insert image description here

To prevent adding indexes back to the table

insert image description here

Add indexes to related fields in related queries

insert image description here

It can be seen that after using the index, the query speed optimization is greatly improved. This article will understand MySQL index from the bottom up to practice.

From Binary Tree to B+ Tree

Binary Tree:

A binary tree is a tree data structure with at most two child nodes. A node without a parent node is called a root node, and a node without a child node is called a leaf node.

A binary search tree is a tree where the left child of any node is smaller than the key value of the current node, and the right child is larger than the key value of the current node.

As shown in the binary search tree below, we only need ⌈ log ( n ) ⌉ ⌈log(n)⌉ ⌈log(n)⌉, that is, three times at most to match the data, while linear search will require nnn times in the worst case.

insert image description here

However, the binary tree may degenerate into a linked list, as shown in the figure below, which is equivalent to a full scan, resulting in low efficiency. In order to solve this problem, it is necessary to ensure that the binary tree is always balanced, that is, a balanced binary tree.

insert image description here

Balanced binary tree:

A balanced binary tree (AVL tree) requires that the height difference between the left and right subtrees of each node cannot exceed 1 on the basis of satisfying the characteristics of a binary tree. It ensures a balance in the tree structure. When inserting or deleting data causes imbalance, node adjustments will be made to maintain balance (the specific algorithm is omitted) to ensure search efficiency.

insert image description here

A node in a balanced binary tree corresponds to a key value and data. Every time we look up data, we need to read a node from the disk, which is what we call a disk block. One node corresponds to one disk block. When storing massive amounts of data, the tree will have a large number of nodes, and many disk I/Os will be performed, but the search efficiency is still extremely low. This requires a balanced tree where a single node can store multiple key values ​​and data.

B-tree: B-tree (Blance Tree) is a balanced tree that can store multiple key values ​​and data in a single node. Each node is called a page, which is a page of data. Each node stores more key values ​​and data, placing both key values ​​and data in one page, and each node has more child nodes. The number of child nodes is generally called the order. The number of times the B-tree reads the disk when searching for data is greatly reduced, and the search efficiency is much higher than that of AVL.

In the 3rd-order B-tree shown in the figure below, search for data with id=42. First, determine in the first page that the key value 42 is greater than 39, find the 4th page according to pointer P3, and then compare it. It is less than the key value 45. Then find the 9th page according to pointer P1 and find that there is a matching key value 42, that is, the corresponding data is found.

insert image description here

B+ Tree:

B+ tree is a further optimization of B tree. Simply put, the non-leaf nodes of the B+ tree do not store data, but only key values. The reason for this is that the page size in the database is fixed (InnoDB defaults to 16KB). If data is not stored, more key values ​​can be stored, the larger the number of nodes, and the number of disk I/O times for searching data is further reduced.

insert image description here

In addition, the order of the B+ tree is equal to the number of its key values. If a node stores 1,000 key values, then only three layers are needed to store 1 billion data. Therefore, generally, only two disk I/Os are required to search for 1 billion data (great).

At the same time, the data of B+ tree leaf nodes are arranged in order, so B+ tree is suitable for range search, sorted search and grouped search (the data of B is scattered on the nodes, which is relatively difficult), which is why MySQL uses B+ tree index.

Clustered Index

A clustered index is a type of index that reorganizes the actual data on disk and sorts it by the values ​​of one or more specified columns. The physical order of data rows is the same as the logical order of column values ​​(usually the primary key column). There can only be one clustered index in a table (because it can only be stored in one physical order).

InnoDB uses a clustered index. All data in its table will have a primary key. Even if you do not create a primary key, InnoDB will select a Unique key as the primary key. If there is no Unique key defined in the table, InnoDB will add a hidden column named row_id to the table as the primary key.

That is to say, we store data in a B+ tree through InnoDB, and the key value in the B+ tree is the primary key. Then the leaf node in the B+ tree stores all the data in the table (that is, the entire row of data corresponding to the primary key). The data file and the index file are the same file. If the index is found, the data is found, so we call it a clustered index.

Clustered indexes are expensive to update. When inserting a new row or updating the primary key, each updated row will be forced to move to a new location (because it is sorted by the primary key). Moving rows may also face page split problems (that is, the page is full). The storage engine will split the page into two pages to accommodate it, and page splits will take up more disk space. That is, the index is rearranged, resulting in a waste of resources.

Clustered indexes are suitable for range queries. Clustered index queries are very fast and are particularly suitable for range checks (between, <, <=, >, >=) or group by and order by queries. Because after the clustered index finds the row containing the first value, the rows of subsequent index values ​​are physically connected without further searching, avoiding large-range scans and greatly improving query speed.

insert image description here

For example, to query data with id>=19 and id<30: usually the root node resides in the memory (that is, page 1 is already in the memory), first find the key value 19 and its corresponding pointer P2 on page 1, read page 3 through P2 (at this time page 3 is not in the memory and needs to be loaded from the disk), then find the pointer P1 of the key value 19 on page 3, and locate page 8 (also loaded from the disk to the memory). Because the data is linked sequentially in a linked list, the data corresponding to the key value 19 can be found through binary search.

After finding the key value 19, because it is a range search, we can query the linked list in the leaf node, traverse and match the conditions that are met in sequence, and continue to find the key value 21. The last data still cannot meet our requirements. At this time, we will use the pointer P of page 8 to read the data of page 9. Page 9 is not in the memory and also needs to be loaded from the disk and read into the memory. Then, it will be deduced from time to time until the key value 34 is matched and the conditions are not met, then it will terminate. This is a method of finding data through a clustered index.

Nonclustered index

A non-clustered index or non-clustered index (Secondary Index) is a B+ tree index built with columns other than the primary key as key values. The logical order of the index in the index is different from the physical storage order of the rows on the disk. A table can have multiple non-clustered indexes. In InnoDB , except for the primary key index, other indexes can be called auxiliary indexes or secondary indexes.

MyISAM in MySQL uses non-clustered indexes. The storage order of table data has nothing to do with index data. Leaf nodes contain index field values ​​and logical pointers to data page data rows (the number of rows is the same as the amount of data in the data table). Therefore, if you want to find data, you need to search the clustered index based on the primary key. The process of searching for data based on the clustered index is called table return.

For example, define a data table test, which is composed of test.frm , tsst.myd and test.myi :

  • .frm: records the table definition statements.
  • myd: records the real table data.
  • myi: records index data

When retrieving data, first search in the index tree test.myi , get the row position of test.myd where the data is located, and get the data. Therefore, the index file and data file of the MyISAM engine are separate and independent. Finding the index does not mean finding the data, that is, the non-clustered index.

A table can have more than one non-clustered index. In fact, each table can have up to 249 non-clustered indexes. However, each time a new index is created for a field, the data in the field will be copied to generate the index. Therefore, adding an index to a table will increase the size of the table and take up a lot of disk space and memory. So if disk space and memory are limited, the number of non-clustered indexes should be limited.

In addition, every time you change the data in a table with a nonclustered index, the index must be updated at the same time, so the nonclustered index will slow down the insertion and update speed.

insert image description here

For example, to search for data 36, ​​two numbers are used. The first number 36 represents the key value of the index, and the second number 64 represents the primary key of the data. So after we find 36, we don’t get the data, we still have to go to the clustered index table to find the data based on its corresponding primary key.

Joint index and covering index

A joint index, as the name implies, is an index that combines multiple columns on a table. When creating a joint index, the most frequently used columns will be placed on the left according to business needs, because MySQL index queries follow the principle of leftmost prefix matching.

The leftmost prefix matching principle means that the leftmost is first. When retrieving data, the matching starts from the leftmost of the joint index. So when we create a joint index, such as (a, b, c), it is equivalent to creating three indexes: (a), (a, b), and (a, b, c). This is the leftmost matching principle.

A covering index is just a joint index specific to a specific select statement. That is to say, for a select statement, a joint index can directly obtain the query results through the index without returning to the table for query. This joint index is said to cover this select statement. It can perfectly solve the problem of non-clustered index back table query, but the premise is to pay attention to the leftmost matching principle of the index when querying.

B+Tree Index VS Hash Index

principle:

  • B+ tree indexes may need to use binary searches multiple times to find the corresponding data blocks.
  • When using Hash indexing, the Hash function is used to calculate the Hash value and find the corresponding data in the table.

Hash indexes are suitable for precise queries of equal values ​​of a large number of different data, but do not support fuzzy queries or range queries, cannot be used for sorting, and do not support the leftmost matching principle of joint indexes. In addition, when there are a large number of duplicate key values, there will be hash collision problems.

Normal index and unique index

Duplicate values ​​can be written to the fields of a common index, but duplicate values ​​cannot be written to the fields of a unique index. First introduce Insert Buffer and Change Buffer:

  • Insert Buffer
  • For the insertion of non-clustered indexes, first determine whether the inserted non-clustered index is in the cache pool. If it is present, it is inserted directly. If not, it is put into the Insert Buffer first, and then the merge operation of the Insert Buffer and the auxiliary index page child node is mirrored at a certain frequency and situation. Merge multiple insertions into one operation to reduce discrete disk reads. Requires the index to be secondary and not unique.
  • Change Buffer
  • It is an upgraded version of Insert Buffer, which supports deletion and modification in addition to insertion. The usage scenario is set through innodb_change_buffer, which defaults to all (there are also values ​​such as none, inserts, and changes). The maximum memory usage ratio is set through innodb_change_buffer_max_size (default 25%, maximum value 50%), but deadlock may occur under the RR isolation level. It is also required that the index is a secondary index and not unique.

The unique index uses Insert Buffer because to determine whether the uniqueness constraint is violated, if everything has been read into the memory, it will be faster to directly update the memory, and there is no need to use Change Buffer.

After a normal index finds the first record that meets the conditions, it continues to search for the next record until the conditions are no longer met. For a unique index, the search ends when the first record is found. However, InnoDB reads data into memory page by page, and the data that meets the requirements later may all be in the previous data pages, so the memory consumption of several more scans of ordinary indexes can be ignored.

summary:

  • When data is modified, ordinary indexes can use Change Buffer, but unique indexes cannot.
  • When data is modified, unique indexes are more prone to deadlock at the RR isolation level.
  • When querying data, a common index needs to continue to determine the next record after finding a record, while a unique index returns directly after finding the record.
  • When the business requires a field to be unique, if the code can guarantee that a unique value is written, use a normal index, otherwise use a unique index.

InnoDB VS MyISAM

MyISAM InnoDB
Data storage .frm stores table definitions, .myd data files, .myi index files If independent tablespace is not enabled, then .frm file, otherwise idb file
Index Implementation Non-clustered indexes are stored randomly, and only the index is cached. Clustered index sequential storage, cache index and data
Storage Space Can be compressed, takes up less storage space, supports three formats: static table, dynamic table, and compressed table More memory and storage required
Backup and Recovery File storage is cross-platform and can be operated on a single table Copy data files and back up binlogs, which may be very large
Transactions Not supported (nor foreign keys, more emphasis on performance) Support (including advanced features such as foreign keys, security, rollback, etc.)
auto_increment The auto-increment column must be an index and cannot be the first column in a joint index. The auto-increment column must be an index and must be the first column in a joint index.
Lock Support table-level locks Support row-level locks
Full-text index Support FULLTEXT type full-text index Does not support FULLTEXT, but can use Sphinx plugin
Table primary key Allows tables without any indexes or primary keys to exist A hidden primary key will be automatically generated
Total number of rows The total number of rows in the table The total number of rows in the table is not saved, and auxiliary indexes are used to traverse
CRUD Relatively suitable for large-scale queries Relatively suitable for addition, modification and deletion

In contrast, you can basically consider using InnoDB to replace MyISAM . InnoDB is also the current default engine of MySQL, but you need to analyze specific issues specifically, and you can also compare the advantages and disadvantages of the two according to actual conditions and choose a more suitable one.

Let's expand on why MyISAM queries are faster than InnoDB ?

  • InnoDB caches data and indexes; MyISAM only caches indexes, reducing swapping in and out.
  • InnoDB addresses are mapped to blocks and then to rows; MyISAM directly records the file's OFFSET, which makes positioning faster.
  • InnoDB also needs to maintain MVCC consistency. Maybe your scenario does not have it, but it still needs to be checked and maintained.

MVCC (Multi-Version Concurrency Control)

InnoDB adds two additional hidden values ​​(creation version number and deletion version number) to each row record to implement MVCC, one to record the row data creation time, and the other to record the row data expiration/deletion time. However, InnoDB does not store the actual time when these events occurred. Instead, it only stores the system version number when these events occurred. As transactions are created and grow, each transaction records its own system version number at the beginning, and each query must check whether the version number of each row of data is the same as the transaction version number. That is to say, the creation version number of each row of data is not greater than the transaction version number to ensure that the row data exists before the transaction is created; the deletion version number of the row data is greater than the transaction version number or is undefined to ensure that the row data is not deleted before the transaction starts.

Analyze index usage with explain

explain can show the execution effect of SQL statements, which can help choose better indexes and optimize query statements. The syntax is: explain select... from ... [where...]。

Use the test table outlined in the previous section to test:

mysql> explain select * from test where a=88888;
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+
| 1 | SIMPLE | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+------+------+------+
1 row in set, 1 warning (0.03 sec)
mysql> explain select b,c from test where b=88888;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | test | NULL | ref | idx_b_c | idx_b_c | 5 | const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test where a=(select a from test where a=88888);
+----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+
| 1 | PRIMARY | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | Using where |
| 2 | SUBQUERY | test | NULL | ref | idx_a | idx_a | 5 | const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+-------+--------+-------+------+------+---------+
2 rows in set, 1 warning (0.00 sec)

Just focus on these three columns: select_type , type , and extra .

select_type value illustrate
SIMPLE Simple query (no joins or subqueries)
PRIMARY Contains related queries or subqueries
UNION The second and subsequent queries in a union query
DEPENDENT UNION The second and subsequent queries in the external related query
UNION RESULT Combined query results
SUBQUERY The first query in the subquery
DEPENDENT SUBQUERY The first query in a subquery that depends on the outer query
DERIVED Queries using derived tables
MATERIALIZED Materialized subquery
UNCACHEABLE SUBQUERY Subquery results cannot be cached and must be re-evaluated for each row of the outer query

type (shows which table this row of data is about)

The value of type illustrate
system The query object has only a short period of data, the best case
const Based on annotation or unique index query, at most one result is returned
eq_ref When joining tables, scans are done based on primary keys or non-NULL unique indexes
ref Equivalence query based on common index or equivalence join between tables
fulltest Full text search
ref_or_null The table join type is ref, but the scanned index may contain NULL values.
index_merge Leveraging multiple indexes
unique_subquery Subquery using unique index
index_subquery Subquery using normal index
range Using indexes for range queries
index Full index scan

extra (details to resolve the query)

The value of extra illustrate
Using filesort Use external sorting instead of index sorting
Using temporary A temporary table needs to be created to store the structure, which usually occurs when performing a group by on a column that has no index.
Using index Using covering indexes
Using where Use where to process the results
Impossible where The result of the where clause is always false and no data can be selected
Using join buffer In the associated query, the associated field of the driven table has no index
Using index condition Conditionally filter the index before searching for data
Select tables optimized away Use an aggregate function to access a field that has an index

Summarize

This article ends here. I hope it can be helpful to you. I also hope you can pay more attention to more content on 123WORDPRESS.COM!

You may also be interested in:
  • MySql cache query principle and cache monitoring and index monitoring introduction
  • MySQL series 9 MySQL query cache and index
  • A brief discussion on several situations where adding indexes to MySQL does not take effect
  • Rules for using mysql joint indexes
  • MySQL sorting using index scan

<<:  The combination and difference between ENTRYPOINT and CMD in dockerfile

>>:  Introduction to the use of alt and title attributes of HTML img tags

Recommend

Example of deploying Laravel application with Docker

The PHP base image used in this article is: php:7...

CSS container background 10 color gradient Demo (linear-gradient())

grammar background: linear-gradient(direction,col...

JavaScript implements class lottery applet

This article shares the specific code of JavaScri...

Perfect solution to MySQL common insufficient memory startup failure

1. If MySQL is not started successfully, check th...

Uniapp implements DingTalk scan code login sample code

Since Uniapp does not have DingTalk authorization...

How to set password for mysql version 5.6 on mac

MySQL can be set when it is installed, but it see...

How to create users and manage permissions in MySQL

1. How to create a user and password 1. Enter the...

Beginners learn some HTML tags (2)

Related article: Beginners learn some HTML tags (1...

Gallery function implemented by native Js

Table of contents The first The second Native Js ...

In-depth explanation of Set and WeakSet collections in ES6

Table of contents Set is a special collection who...

How to limit the input box to only input pure numbers in HTML

Limit input box to only pure numbers 1、onkeyup = ...

Analysis of the reasons why MySQL field definitions should not use null

Why is NULL so often used? (1) Java's null Nu...

MySQL 8.0.12 winx64 detailed installation tutorial

This article shares the installation tutorial of ...

Detailed steps for installing Harbor, a private Docker repository

The installation of Harbor is pretty simple, but ...