The correct way to use MySQL indexes and detailed explanation of index principles

The correct way to use MySQL indexes and detailed explanation of index principles

1. Introduction

Why do we need indexes?

In general application systems, the read-write ratio is about 10:1, and insert operations and general update operations rarely have performance issues. In a production environment, the most common and most problematic operations we encounter are still some complex query operations, so the optimization of query statements is obviously a top priority. When it comes to speeding up queries, we have to mention indexes.

What is an index?

An index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes are important for good performance

Especially when the amount of data in the table grows, the impact of index on performance becomes more and more important.

Index optimization should be the most effective means to optimize query performance. Indexes can easily improve query performance by several orders of magnitude.

The index is equivalent to the phonetic table of the dictionary. If you want to look up a certain word, if you don’t use the phonetic table, you need to search through hundreds of pages one by one.

 30

 10 40

 5 15 35 66

1 6 11 19 21 39 55 100

Do you have any misunderstandings about indexing?

Indexing is an important aspect of application design and development. If there are too many indexes, the performance of your application may suffer. However, if there are too few indexes, query performance will be affected. Finding a balance is crucial to the performance of the application. Some developers always think of adding indexes as an afterthought - I have always believed that this stems from a wrong development model. If you know how your data will be used, you should add indexes where needed from the beginning. Developers often use the database at the application level, such as writing SQL statements, stored procedures, etc. They may not even know the existence of indexes, or think that they can just ask the relevant DBA to add them afterwards. DBAs often do not have a good understanding of the business data flow, and adding indexes requires monitoring a large number of SQL statements to find problems. This step will definitely take much longer than the time required to initially add the index, and some indexes may be missed. Of course, the more indexes, the better. I once encountered a problem like this: the iostat of a MySQL server showed that the disk usage rate was always at 100%. After analysis, it was found that it was because the developers added too many indexes. After deleting some unnecessary indexes, the disk usage rate immediately dropped to 20%. It can be seen that adding indexes is also very technical.

2. The principle of index

1. Indexing Principle

The purpose of the index is to improve query efficiency, which is the same as the catalog we use to look up books: first locate the chapter, then locate a subsection under the chapter, and then find the page number. Similar examples include: looking up a dictionary, checking train schedules, flight schedules, etc.

The essence is: to filter out the final desired result by continuously narrowing the scope of the data to be obtained, and at the same time turn random events into sequential events. In other words, with this indexing mechanism, we can always use the same search method to lock the data.

The same is true for databases, but it is obviously much more complicated because it not only faces equality queries, but also range queries (>, <, between, in), fuzzy queries (like), union queries (or), and so on. How should the database choose to deal with all the problems? Let’s think back to the dictionary example. Can we divide the data into segments and then query them in segments? The simplest way is to divide 1,000 pieces of data into the first section with numbers 1 to 100, the second section with numbers 101 to 200, and the third section with numbers 201 to 300. Then, to check the 250th piece of data, you only need to find the third section, thus eliminating 90% of the invalid data at once. But what if there are 10 million records, how many segments should they be divided into? Students who have a basic knowledge of algorithms will think of the search tree, which has an average complexity of lgN and has good query performance. But here we have overlooked a key issue, the complexity model is based on the cost of the same operation each time. The database implementation is relatively complex. On the one hand, the data is stored on the disk. On the other hand, in order to improve performance, part of the data can be read into the memory for calculation each time. Because we know that the cost of accessing the disk is about 100,000 times that of accessing the memory, a simple search tree is difficult to meet complex application scenarios.

2. Disk IO and Pre-reading

We mentioned disk access earlier, so here is a brief introduction to disk IO and pre-reading. Disk reading relies on mechanical movement. The time spent on each data read can be divided into three parts: seek time, rotational delay, and transmission time. Seek time refers to the time required for the magnetic arm to move to the specified track, and mainstream disks are generally below 5ms; rotational delay is the disk rotation speed we often hear about. For example, a disk with 7200 rpm means it can rotate 7200 times per minute, which means it can rotate 120 times per second. The rotational delay is 1/120/2 = 4.17ms; transmission time refers to the time it takes to read or write data from the disk, which is generally a few tenths of a millisecond and can be ignored compared to the first two times. So the time to access a disk, that is, the time for a disk IO, is about 5+4.17 = 9ms, which sounds good, but you have to know that a 500-MIPS (Million Instructions Per Second) machine can execute 500 million instructions per second. Because instructions rely on the nature of electricity, in other words, the time to execute an IO can execute about 4.5 million instructions. Databases often contain hundreds of thousands, millions, or even tens of millions of data. 9 milliseconds each time is obviously a disaster. The following figure is a comparison of computer hardware delays for your reference:

Considering that disk IO is a very expensive operation, the computer operating system has made some optimizations. During an IO, not only the data at the current disk address, but also the adjacent data are read into the memory buffer. This is because the principle of local pre-reading tells us that when the computer accesses data at an address, the data adjacent to it will also be accessed quickly. The data read each time by IO is called a page. The specific size of a page depends on the operating system, which is usually 4k or 8k. That is, when we read data in a page, only one IO actually occurs. This theory is very helpful for the design of index data structure.

3. Index data structure

Earlier we discussed the basic principles of indexing, the complexity of databases, and the relevant knowledge of operating systems. The purpose is to make everyone understand that any data structure does not come out of thin air, and it must have its background and usage scenarios. Let's now summarize what we need this data structure to do. In fact, it is very simple, that is: each time you look up data, control the number of disk IO times to a very small order of magnitude, preferably a constant order of magnitude. So we wonder if a highly controllable multi-way search tree can meet the needs? In this way, the b+ tree came into being (the B+ tree evolved from the binary search tree, then the balanced binary tree, and the B tree).

As shown in the figure above, it is a b+ tree. For the definition of b+ tree, please refer to B+ tree. Here we will only talk about some key points. The light blue block is called a disk block. You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35, and contains pointers P1, P2, and P3. P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. The real data exists in the leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, and 99. Non-leaf nodes do not store real data, but only store data items that guide the search direction. For example, 17 and 35 do not actually exist in the data table.

###b+ tree search process

As shown in the figure, if you want to find data item 29, disk block 1 will be loaded from the disk to the memory first. At this time, an IO occurs. A binary search is used in the memory to determine that 29 is between 17 and 35. The P2 pointer of disk block 1 is locked. The memory time is very short (compared to the disk IO) and can be ignored. Disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of disk block 1. The second IO occurs. 29 is between 26 and 30. The P2 pointer of disk block 3 is locked. Disk block 8 is loaded into the memory through the pointer. The third IO occurs. At the same time, a binary search is performed in the memory to find 29, and the query ends. A total of three IOs are performed. The reality is that a 3-layer b+ tree can represent millions of data. If searching millions of data only requires three IOs, the performance improvement will be huge. If there is no index, each data item will require an IO, then a total of millions of IOs will be required, which is obviously very costly.

###b+Tree Properties

1. The index field should be as small as possible: Through the above analysis, we know that the number of IO times depends on the height h of b+number. Assuming that the data in the current data table is N, and the number of data items in each disk block is m, then h=㏒(m+1)N. When the data volume N is constant, the larger m is, the smaller h is; and m = disk block size/data item size. The disk block size is the size of a data page, which is fixed. If the space occupied by the data item is smaller and the number of data items is greater, the height of the tree is lower. This is why each data item, i.e. the index field, should be as small as possible. For example, int occupies 4 bytes, which is half of bigint 8 bytes. This is why the b+ tree requires that the real data be placed in the leaf nodes rather than the inner nodes. Once placed in the inner nodes, the data items of the disk blocks will drop significantly, causing the tree to increase in height. When the data item is equal to 1, it will degenerate into a linear list.

2. The leftmost matching feature of the index: When the data item of the b+ tree is a composite data structure, such as (name, age, sex), the b+ tree builds the search tree in order from left to right. For example, when data such as (Zhang San, 20, F) is retrieved, the b+ tree will first compare the name to determine the next search direction. If the name is the same, then the age and sex are compared in turn to finally get the retrieved data; but when data without a name such as (20, F) comes, the b+ tree does not know which node to check next, because the name is the first comparison factor when the search tree is established, and it is necessary to search according to the name first to know where to query next. For example, when retrieving data like (Zhang San, F), the b+ tree can use name to specify the search direction, but the next field age is missing, so it can only find the data with the name equal to Zhang San, and then match the data with the gender of F. This is a very important property, namely the leftmost matching feature of the index.

4. Clustered index and auxiliary index

In the database, the height of the B+ tree is generally 2 to 4 layers, which means that it only takes 2 to 4 IOs at most to find a row record of a certain key value, which is quite good. Because the current general mechanical hard disk can perform at least 100 IOs per second, 2 to 4 IOs means that the query time only takes 0.02 to 0.04 seconds.

The B+ tree index in the database can be divided into clustered index and secondary index.

The same thing about clustered indexes and auxiliary indexes is that both are in the form of B+ trees, that is, the height is balanced, and the leaf nodes store all the data.

The difference between a clustered index and a secondary index is whether the leaf node stores a whole row of information.

1. Clustered Index

#InnoDB storage engine means index-organized table, that is, the data in the table is stored in the order of primary key. A clustered index constructs a B+ tree according to the primary key of each table. The leaf nodes store the row record data of the entire table. The leaf nodes of the clustered index are called data pages. This characteristic of the clustered index determines that the data in the index-organized table is also part of the index.

Like the B+ tree data structure, each data page is linked through a doubly linked list. #If the primary key is not defined, MySQL takes the first unique index (unique) that contains only non-empty columns (NOT NULL) as the primary key, and InnoDB uses it as the clustered index. #If there is no such column, InnoDB generates an ID value like this by itself. It has six bytes and is hidden, making it a clustered index. #Since the actual data pages can only be sorted according to one B+ tree, each table can only have one clustered index. In many cases, the query optimizer prefers a clustered index.

Because the clustered index can find data directly on the leaf nodes of the B+ tree index. In addition, because it defines the logical order of the data, clustered indexes can provide particularly fast access to queries targeting range values.

One of the benefits of a clustered index is that it can quickly sort and search for primary keys, and the data in the leaf nodes is exactly what the user wants to query. For example, if a user needs to search a table and query the last 10 digits of user information, since the B+ tree index is a bidirectional linked list, the user can quickly find the last data page and retrieve 10 records.

The second benefit of clustered indexes: range query. That is, if you want to find data within a certain range of the primary key, you can get the page range through the upper intermediate nodes of the leaf nodes, and then directly read the data page.

2. Auxiliary index

Except for the clustered index, all other indexes in the table are secondary indexes (also called non-clustered indexes). The difference from the clustered index is that the leaf nodes of the secondary index do not contain all the data of the row record.

In addition to the key value, each leaf node index row also contains a bookmark. The bookmark is used to tell the InnoDB storage engine where to find the row data corresponding to the index.

Since the InnoDB storage engine is an index-organized table, the bookmark of the InnoDB storage engine's auxiliary index is the clustered index key of the corresponding row data. As shown below

The existence of auxiliary indexes does not affect the organization of data in the clustered index, so each table can have multiple auxiliary indexes, but only one clustered index. When searching for data through auxiliary indexes, the InnoDB storage engine traverses the auxiliary indexes and obtains the primary key of the primary key index through the leaf-level pointer, and then finds a complete row record through the primary key index.

For example, if you search for data in an auxiliary index tree with a height of 3, you need to traverse the auxiliary index tree 3 times to find the specified primary key. If the height of the clustered index tree is also 3, you need to search the clustered index tree 3 times to finally find the page where the complete row data is located. Therefore, a total of 6 logical IO accesses are required to get the final data page.

5. MySQL Index Management

1. Function

1. The function of index is to speed up search

2. Primary key, unique, and joint unique in MySQL are also indexes. In addition to speeding up searches, these indexes also have constraints.

2. Commonly used indexes in MySQL

Normal index INDEX: speed up search

Unique index: -Primary key index: speed up search + constraint (not empty, cannot be repeated) -Unique index UNIQUE: speed up search + constraint (cannot be repeated)

Combined index: -PRIMARY KEY(id,name): combined primary key index -UNIQUE(id,name): combined unique index -INDEX(id,name): combined common index

3. Two major types of indexes: hash and btree

#When creating the above index, we can specify the index type for it, which is divided into two categories
Hash index: single query is fast, range query is slow
Btree type index: B+ tree, the more layers there are, the exponential growth of data volume (we use it because InnoDB supports it by default)

#Different storage engines support different index types
InnoDB supports transactions, row-level locking, B-tree, Full-text and other indexes, but does not support Hash indexes.
MyISAM does not support transactions, but supports table-level locking, B-tree, Full-text and other indexes, but does not support Hash indexes;
Memory does not support transactions, but supports table-level locking, B-tree, Hash and other indexes, but does not support Full-text indexes;
NDB supports transactions, row-level locking, and hash indexes, but does not support B-tree, Full-text, and other indexes.
Archive does not support transactions, but supports table-level locking. It does not support B-tree, Hash, Full-text, and other indexes.

4. Syntax for creating/deleting indexes

#Method 1: Create a table CREATE TABLE table name (
 Field Name 1 Data Type [Integrity Constraints…],
 Field name 2 Data type [integrity constraint...],
 [UNIQUE | FULLTEXT | SPATIAL ] INDEX | KEY
 [index name] (field name [(length)] [ASC | DESC]) 
 );


#Method 2: CREATE creates an index on an existing table CREATE [UNIQUE | FULLTEXT | SPATIAL ] INDEX index name ON table name (field name [(length)] [ASC | DESC]);


#Method 3: ALTER TABLE creates an index on an existing table ALTER TABLE table name ADD [UNIQUE | FULLTEXT | SPATIAL ] INDEX
 Index Name(Field Name[(Length)] [ASC |DESC]);
 
#Delete index: DROP INDEX index name ON table name;
#Method 1 create table t1(
 id int,
 name char,
 age int,
 sex enum('male','female'),
 unique key uni_id(id),
 index ix_name(name) #index has no key
);


#Method 2 create index ix_age on t1(age);

#Method 3 alter table t1 add index ix_sex(sex);

#Viewmysql> show create table t1;
| t1 | CREATE TABLE `t1` (
 `id` int(11) DEFAULT NULL,
 `name` char(1) DEFAULT NULL,
 `age` int(11) DEFAULT NULL,
 `sex` enum('male','female') DEFAULT NULL,
 UNIQUE KEY `uni_id` (`id`),
 KEY `ix_name` (`name`),
 KEY `ix_age` (`age`),
 KEY `ix_sex` (`sex`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

6. Test Index

1. Preparation

#1. Prepare table create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);

#2. Create a stored procedure to insert records in batches delimiter $$ #Declare the end symbol of the stored procedure to be $$
create procedure auto_insert1()
BEGIN
 declare i int default 1;
 while(i<3000000)do
 insert into s1 values(i,'duoduo','male',concat('duoduo',i,'@oldboy'));
 set i=i+1;
 end while;
END$$ #$$end delimiter; #Re-declare the semicolon as the end symbol#3. View the stored procedure show create procedure auto_insert1\G 

#4. Call the stored procedure call auto_insert1();

#Wait for a certain amount of time to see the machine performance

Tip: The time it takes to create a table depends on the performance of the machine, so please be patient!

2. Test query speed without index

#No index: MySQL has no idea whether there is a record with id equal to 333333333. It can only scan the data table from beginning to end. At this time, there are as many IO operations as there are disk blocks, so the query speed is very slow. mysql> select * from s1 where id=333333333;
Empty set (0.33 sec)

3. If a large amount of data already exists in the table, creating an index for a field segment will be very slow

4. After the index is established, the query speed is significantly improved when the field is used as the query condition

PS:

1. MySQL first searches the index table based on the B+ tree search principle and quickly finds that the record with id equal to 333333333 does not exist. The IO is greatly reduced, so the speed is significantly improved.

2. We can find the table in the data directory of mysql and see that it occupies more hard disk space.

3. Please note that as shown below

V. Conclusion

#1. Be sure to create an index for the search condition field, such as select * from s1 where id = 333; you need to add an index for id

#2. When there is already a large amount of data in the table, creating an index will be very slow and take up hard disk space. After the index is created, the query speed will be faster. For example, create index idx on s1(id); will scan all the data in the table, and then use id as the data item to create an index structure and store it in the table on the hard disk.
After it is built, the query will be very fast.

#3. Note that the index of the innodb table will be stored in the s1.ibd file, while the index of the myisam table will have a separate index file table1.MYI

MySAM index files and data files are separate, and the index files only save the addresses of data records.
In InnoDB, the table data file itself is an index structure organized according to B+Tree (BTree, which stands for Balance True), and the leaf node data field of this tree stores complete data records.
The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
Because the data files of InnoDB are clustered according to the primary key, InnoDB requires that the table must have a primary key (MyISAM does not need one).
If not explicitly defined, the MySQL system will automatically select a column that can uniquely identify the data record as the primary key.
If this column does not exist, MySQL will automatically generate an implicit field for the InnoDB table as the primary key. The length of this field is 6 bytes and the type is long integer.

7. Use indexes correctly

1. Index miss

It does not mean that creating an index will definitely speed up the query. If we want to use the index to achieve the expected effect of improving the query speed, we must follow the following issues when adding the index.

1. Scope problem, or the condition is not clear. The following symbols or keywords appear in the condition: >, >=, <, <=, !=, between...and..., like,

Greater than, less than

Not equal! =

between ...and...

like

2Try to choose columns with high discrimination as indexes. The formula for discrimination is count(distinct col)/count(*), which indicates the ratio of non-duplicate fields. The larger the ratio, the fewer records we need to scan. The discrimination of a unique key is 1, while some status and gender fields may have a discrimination of 0 in the face of big data. Someone may ask, is there any empirical value for this ratio? This value is difficult to determine in different usage scenarios. Generally, we require that the value of the field to be joined is above 0.1, which means that on average, 1 scan takes 10 records.

# First delete all indexes in the table so that we can focus on the problem of discrimination mysql> desc s1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | MUL | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(5) | YES | | NULL | |
| email | varchar(50) | YES | MUL | NULL | |
+--------+-------------+------+-----+---------+-------+
rows in set (0.00 sec)

mysql> drop index a on s1;
Query OK, 0 rows affected (0.20 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> drop index d on s1;
Query OK, 0 rows affected (0.18 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> desc s1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(5) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
rows in set (0.00 sec)

First, delete all the indexes in the table and let's focus on the problem of discrimination.

We write a stored procedure to add batch records to table s1. The value of the name field is duoduo, which means that the name field has a low degree of distinction (the same is true for the gender field, which we will deal with later).

Recall the structure of the b+ tree. The query speed is inversely proportional to the height of the tree. To keep the height of the tree very low, it is necessary to ensure that: in a certain layer, the data items are arranged in order from left to right and from small to large, that is, left 1 < left 2 < left 3 < ...

For fields with low discrimination, it is impossible to find the size relationship because the values ​​are all equal. Undoubtedly, if you still want to use b+ tree to store these equal-valued data, you can only increase the height of the tree. The lower the discrimination of the field, the higher the height of the tree. In extreme cases, when the values ​​of the index fields are the same, the b+ tree becomes almost a stick. This is an extreme case in this example. All values ​​in the name field are 'duoduo'

#Now we come to a conclusion: if we create an index for a field with low discrimination, the height of the index tree will be very high, but what specific impact will this have? ? ?

#1: If the condition is name='xxxx', then it can be determined immediately that 'xxxx' is not in the index tree (because all values ​​in the tree are 'duoduo'), so the query speed is very fast

#2: If the condition is name='duoduo', when querying, we can never get a clear range from a certain position in the tree, we can only look down, down, and down. . . This is not much different from the IO times of a full table scan, so it is slow

analyze

3. = and in can be in any order, for example, a = 1 and b = 2 and c = 3. You can create an (a, b, c) index in any order, and the MySQL query optimizer will help you optimize it into a form that the index can recognize.

4. Index columns cannot be used in calculations. Keep the columns "clean". For example, if from_unixtime(create_time) = '2014-05-29', the index cannot be used. The reason is simple. The b+ tree stores the field values ​​in the data table. However, when searching, all elements need to be compared with the function, which is obviously too costly. So the statement should be written as create_time = unix_timestamp('2014-05-29')

5. and/or

#1、Logical conditions of and and or 1 and 2: All conditions must be met to be considered as true. If one condition is not met, the final result will not be true. 1 or 2: As long as one condition is met, the final result will be true.

#2、The working principle and conditions of AND:
a = 10 and b = 'xxx' and c > 3 and d = 4
index:
Create a joint index (d, a, b, c)
Working principle:
For multiple consecutive ands: MySQL will find an index field with high discrimination from left to right according to the joint index (so that a small range can be quickly locked) to speed up the query, that is, in the order of d->a->b->c

#3、or's working principle conditions:
a = 10 or b = 'xxx' or c > 3 or d = 4
index:
Create a joint index (d, a, b, c)

Working principle:
For multiple consecutive ors: MySQL will judge from left to right in the order of the conditions, i.e. a->b->c->d

When the condition on the left is met but the index field has low discrimination (name, speeding up query)

6 The leftmost prefix matching principle is a very important principle. For a combined index, MySQL will keep matching to the right until it encounters a range query (>, <, between, like) and then stop matching (which means that the range is large and the index speed is slow). For example, if a = 1 and b = 2 and c > 3 and d = 4 are created in the order of (a, b, c, d), d will not be used in the index. If an index is created in the order of (a, b, d, c), all of them can be used, and the order of a, b, d can be adjusted arbitrarily.

7. Other situations

- Use function select * from tb1 where reverse(email) = 'duoduo';
 
- Type inconsistency If the column is a string type, the input condition must be enclosed in quotation marks, otherwise...
 select * from tb1 where email = 999;
 
#If the sorting condition is an index, the select field must also be an index field, otherwise it will not be hit - order by
 select name from s1 order by email desc;
 When sorting by index, if the select query field is not indexed, the speed is still slow. select email from s1 order by email desc;
 Special: If you sort the primary key, it is still very fast:
 select * from tb1 order by nid desc;
 
- The leftmost prefix of the combined index If the combined index is: (name, email)
 name and email -- hits indexname -- hits indexemail -- misses index- count(1) or count(column) instead of count(*) makes no difference in mysql- create index xxxx on tb(title(19)) #text type, length must be specified

Other considerations

- Avoid using select *
- count(1) or count(column) instead of count(*)
- When creating a table, try to use char instead of varchar
- The order of the fields in the table is fixed length fields first
- Composite indexes instead of multiple single-column indexes (when multiple conditions are often used for query)
- Try to use short indexes
- Use JOIN instead of sub-queries
- When joining tables, make sure the condition type is consistent
- Index hash values ​​(with few duplicates) are not suitable for indexing, e.g. gender is not suitable

8. Joint Index and Covering Index

1. Joint Index

A joint index refers to combining multiple columns on a table into one index. The method of creating a joint index is the same as that of creating a single index. The only difference is that there are multiple index columns, as follows

mysql> create table t(
 -> a int,
 -> b int,
 -> primary key(a),
 -> key idx_a_b(a,b)
 -> );
Query OK, 0 rows affected (0.11 sec)

So when do you need to use a joint index? Before discussing this issue, let's take a look at the results inside the joint index. In essence, a joint index is a B+ tree, the difference is that the number of key values ​​in a joint index is not 1, but >= 2. Next, let's discuss the joint index consisting of two integer columns. Assume that the names of the two key values ​​are a and b as shown in the figure.

You can see that this is no different from the single-key B+ tree we saw before. The key values ​​are sorted, and all data can be read logically and sequentially through the leaf nodes. For the example above, (1,1), (1,2), (2,1), (2,4), (3,1), (3,2), the data is stored in the order of (a,b).

Therefore, for the query select * from table where a=xxx and b=xxx, it is obvious that the joint index (a,b) can be used. For the query select * from table where a=xxx on a single column a, the index (a,b) can also be used.

But for the query of column b, select * from table where b=xxx, the (a,b) index cannot be used. In fact, it is not difficult to find the reason. The values ​​of b on the leaf nodes are 1, 2, 1, 4, 1, 2, which are obviously not sorted. Therefore, the (a,b) index cannot be used for the query of column b.

The second benefit of a joint index is that when the first key is the same, the second key has already been sorted. For example, in many cases, an application needs to query a user's shopping history, sort by time, and finally retrieve the three most recent purchase records. In this case, using a joint index can help us avoid an extra sorting operation because the index itself has already been sorted at the leaf node, as shown below.

#===========Prepare the table==============
create table buy_log(
 userid int unsigned not null,
 buy_date date
);

insert into buy_log values
(1,'2009-01-01'),
(2,'2009-01-01'),
(3,'2009-01-01'),
(1,'2009-02-01'),
(3,'2009-02-01'),
(1,'2009-03-01'),
(1,'2009-04-01');

alter table buy_log add key(userid);
alter table buy_log add key(userid,buy_date);

#==========Verification==============
mysql> show create table buy_log;
| buy_log | CREATE TABLE `buy_log` (
 `userid` int(10) unsigned NOT NULL,
 `buy_date` date DEFAULT NULL,
 KEY `userid` (`userid`),
 KEY `userid_2` (`userid`,`buy_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

#You can see that possible_keys has two indexes available here, namely the single index userid and the joint index userid_2, but the optimizer finally chooses to use the key userid because the leaf node of the index contains a single key value, so theoretically a page can store more records mysql> explain select * from buy_log where userid=2;
+----+-------------+---------+------+-----------------+--------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+-----------------+--------+---------+-------+------+-------+
| 1 | SIMPLE | buy_log | ref | userid,userid_2 | userid | 4 | const | 1 | |
+----+-------------+---------+------+-----------------+--------+---------+-------+------+-------+
row in set (0.00 sec)

#Next, suppose you want to retrieve the three most recent purchase records for userid 1. You will use the joint index userid_2 because in this index, when userid=1, buy_date is already sorted.mysql> explain select * from buy_log where userid=1 order by buy_date desc limit 3;
+----+-------------+---------+------+-----------------+----------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+-----------------+----------+---------+-------+------+--------------------------+
| 1 | SIMPLE | buy_log | ref | userid,userid_2 | userid_2 | 4 | const | 4 | Using where; Using index |
+----+-------------+---------+------+-----------------+----------+---------+-------+------+--------------------------+
row in set (0.00 sec)

#ps: If the extra sorting shows "Using filesort", it means that secondary sorting is required after the data is found. #For the joint index (a, b), the following statement can directly use the index without secondary sorting select ... from table where a=xxx order by b;

#Then for the joint index (a,b,c), the following statement can also get the result directly through the index select ... from table where a=xxx order by b;
select ... from table where a=xxx and b=xxx order by c;

#However, for the joint index (a, b, c), the following statement cannot get the result directly through the index, and you need to perform a filesort operation yourself, because the index (a, c) is not sorted select ... from table where a=xxx order by c;

2. Covering Index

The InnoDB storage engine supports covering indexes (or index covering), that is, query records can be obtained from auxiliary indexes without querying records in the clustered index.

One benefit of using a covering index is that the auxiliary index does not contain all the information of the entire row record, so its size is much smaller than the clustered index, thus reducing a large number of IO operations.

Note: Covering index technology was first completed and implemented in the InnoDB Plugin, which means that for InnoDB versions less than 1.0 or MySQL database versions less than 5.0, the InnoDB storage engine does not support covering index features.

For the auxiliary index of the InnoDB storage engine, since it contains the primary key information, the data stored in its leaf node is (primary key1, primary key2, ..., key1, key2, ...). For example

select age from s1 where id=123 and name = 'duoduo'; #The id field has an index, but the name field does not. This SQL hits the index but does not cover it. You need to look for detailed information in the clustered index.
The best case is that the index field covers everything, so the entire process of indexing can be used to speed up the query and obtain results.mysql> desc s1;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | NO | | NULL | |
| name | varchar(20) | YES | | NULL | |
| gender | char(6) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
rows in set (0.21 sec)

mysql> explain select name from s1 where id=1000; #No index+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | s1 | NULL | ALL | NULL | NULL | NULL | NULL | 2688336 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
row in set, 1 warning (0.00 sec)

mysql> create index idx_id on s1(id); #Create index Query OK, 0 rows affected (4.16 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> explain select name from s1 where id=1000; #The auxiliary index is hit, but the index is not covered. You also need to find name from the clustered index
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+------+------+
| 1 | SIMPLE | s1 | NULL | ref | idx_id | idx_id | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+------+------+
row in set, 1 warning (0.08 sec)

mysql> explain select id from s1 where id=1000; #All information is found in the auxiliary index. Using index means covering index+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | s1 | NULL | ref | idx_id | idx_id | 4 | const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------------+
row in set, 1 warning (0.03 sec)

Another benefit of covering indexes is for certain statistical problems. Based on the table buy_log created in the previous summary, the query plan is as follows

mysql> explain select count(*) from buy_log;
+----+-------------+---------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | buy_log | index | NULL | userid | 4 | NULL | 7 | Using index |
+----+-------------+---------+-------+---------------+--------+---------+------+------+-------------+
row in set (0.00 sec)

The innodb storage engine does not choose to perform statistics by querying the clustered index. Since the buy_log table has a secondary index, and the secondary index is much smaller than the clustered index, selecting the secondary index can reduce IO operations. Therefore, the optimizer chooses the userid secondary index as the key above.

For a joint index of the form (a, b), it is generally not possible to select the so-called query condition in b. However, if it is a statistical operation and it is a covering index, the optimizer will still choose to use the index, as follows

#Joint index userid_2 (userid, buy_date). Generally, we cannot use this index according to buy_date. However, in special cases, when the query statement is a statistical operation and it is a covering index, we can also use this joint index when buy_date is used as the query condition.mysql> explain select count(*) from buy_log where buy_date >= '2011-01-01' and buy_date < '2011-02-01';
+----+-------------+---------+-------+---------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+----------+---------+------+------+--------------------------+
| 1 | SIMPLE | buy_log | index | NULL | userid_2 | 8 | NULL | 7 | Using where; Using index |
+----+-------------+---------+-------+---------------+----------+---------+------+------+--------------------------+
row in set (0.00 sec)

9. Query Optimization Tool - explain

I believe everyone is familiar with the explain command. For specific usage and field meanings, please refer to the official website explain-output. It should be emphasized here that rows is the core indicator. Most statements with small rows must be executed very quickly (there are exceptions, which will be discussed below). Therefore, optimization statements are basically optimizing rows.

Execution plan: Let MySQL estimate the execution operation (usually correct)
all < index < range < index_merge < ref_or_null < ref < eq_ref < system/const
id,email

slow:
select * from userinfo3 where name='alex'

explain select * from userinfo3 where name='alex'
type: ALL (full table scan)
select * from userinfo3 limit 1;
quick:
select * from userinfo3 where email='alex'
type: const (go index)

Reference: https://www.jb51.net/article/140759.htm

10. Basic steps for slow query optimization

0. Run first to see if it is really slow, and set SQL_NO_CACHE
1. Where condition single table query, lock the minimum return record table. This sentence means to apply the where clause of the query statement to the table with the smallest number of records returned, and then query each field of the table separately to see which field has the highest discrimination.
2. Explain to check whether the execution plan is consistent with the expectation in 1 (start querying from the table with fewer locked records)
3. The SQL statement of the form order by limit makes the sorted table search first
4. Understand the business side’s usage scenarios
5. Refer to the major principles of indexing when adding indexes
6. Observe the results. If they do not meet expectations, continue to analyze from 0.

11. Slow log management

Slow log
- Execution time > 10
- Missed Index
- Log file path

Configuration:
- Memory
show variables like '%query%';
show variables like '%queries%';
set global variable name = value
- Configuration File
mysqld --defaults-file='E:\wupeiqi\mysql-5.7.16-winx64\mysql-5.7.16-winx64\my-default.ini'

my.conf content:
slow_query_log = ON
slow_query_log_file = D:/....

Note: After modifying the configuration file, you need to restart the service

MySQL log management

========================================================

Error log: records the startup, shutdown and operation errors of the MySQL server. Binary log: also known as binlog log, records the operations in the database other than SELECT in binary file. Query log: records the query information. Slow query log: records the operations that take longer than the specified time to execute. Relay log: The standby database copies the binary log of the master database to its own relay log, so as to replay it locally. General log: audits which account, in which period, and what events were done. Transaction log or redo log: records Innodb transaction-related information such as transaction execution time, checkpoints, etc.
========================================================

1. bin-log

1. Enable

# vim /etc/my.cnf
[mysqld]
log-bin[=dir\[filename]]
# service mysqld restart

2. Pause

//Current session only SET SQL_LOG_BIN=0;
SET SQL_LOG_BIN=1;

3. View

View All:

# mysqlbinlog mysql.000002
By time:
# mysqlbinlog mysql.000002 --start-datetime="2012-12-05 10:02:56"
# mysqlbinlog mysql.000002 --stop-datetime="2012-12-05 11:02:54"
# mysqlbinlog mysql.000002 --start-datetime="2012-12-05 10:02:56" --stop-datetime="2012-12-05 11:02:54"

By bytes:

# mysqlbinlog mysql.000002 --start-position=260
# mysqlbinlog mysql.000002 --stop-position=260
# mysqlbinlog mysql.000002 --start-position=260 --stop-position=930

4. Truncate bin-log (generate new bin-log file)

a. Restart MySQL server

b. # mysql -uroot -p123 -e 'flush logs'

5. Delete bin-log files

# mysql -uroot -p123 -e 'reset master'

2. Query log

Enable general query log

# vim /etc/my.cnf
[mysqld]
log[=dir\[filename]]
# service mysqld restart

3. Slow query log

Enable slow query log

# vim /etc/my.cnf
[mysqld]
log-slow-queries[=dir\[filename]]
long_query_time=n
# service mysqld restart

MySQL 5.6:

slow-query-log=1
slow-query-log-file=slow.log
long_query_time=3

View slow query logs

test:

BENCHMARK(count,expr)
SELECT BENCHMARK(50000000,2*3);

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM.

You may also be interested in:
  • mysql add index mysql how to create index
  • How to view, create and delete indexes in MySQL
  • MySQL Create Index method, syntax structure and examples
  • Analysis of several situations where Mysql indexes fail
  • Analysis of the connection and difference between MySQL primary key and index
  • In-depth understanding based on MySQL full-text index
  • MySQL performance optimization index optimization
  • MySQL index analysis and optimization
  • Mysql index types and basic usage examples

<<:  Detailed comparison of Ember.js and Vue.js

>>:  How to find the specified content of a large file in Linux

Recommend

How to display a small icon in front of the browser URL

When you browse many websites, you will find that ...

How to use nodejs to write a data table entity class generation tool for C#

Although Microsoft provides T4 templates, I find ...

Detailed explanation of JavaScript Promise and Async/Await

Table of contents Overview Four examples Example ...

Mini Program implements custom multi-level single-select and multiple-select

This article shares the specific code for impleme...

How to customize an EventEmitter in node.js

Table of contents Preface 1. What is 2. How to us...

Display and hide HTML elements through display or visibility

Sometimes we need to control whether HTML elements...

Let's learn about the MySQL storage engine

Table of contents Preface 1. MySQL main storage e...

How to run JavaScript in Jupyter Notebook

Later, I also added how to use Jupyter Notebook i...

Universal solution for MySQL failure to start under Windows system

MySQL startup error Before installing MySQL on Wi...

Example code for implementing a pure CSS pop-up menu using transform

Preface When making a top menu, you will be requi...

Quickly solve the problem that CentOS cannot access the Internet in VMware

Yesterday I installed CentOS7 under VMware. I wan...

Detailed explanation of MySQL master-slave replication and read-write separation

Table of contents Preface 1. Overview 2. Read-wri...

Specific use of MySQL binlog_ignore_db parameter

Preface: After studying the previous article, we ...