MySQL index principle and query optimization detailed explanation

MySQL index principle and query optimization detailed explanation

1. Introduction

1. What is an index?

In general application systems, the read-write ratio is about 10:1, and insert operations and general update operations rarely have performance issues. In a production environment, the most common and most problematic operations we encounter are still some complex query operations, so the optimization of query statements is obviously a top priority. When it comes to speeding up queries, we have to mention indexes.

2. Why do we need indexes?

An index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes are important for good performance

Especially when the amount of data in the table grows, the impact of index on performance becomes more and more important.

Index optimization should be the most effective means to optimize query performance. Indexes can easily improve query performance by several orders of magnitude.

The index is equivalent to the phonetic table of the dictionary. If you want to look up a certain word, if you don’t use the phonetic table, you need to search through hundreds of pages one by one.

2. The principle of index

One index principle

The purpose of the index is to improve query efficiency, which is the same as the catalog we use to look up books: first locate the chapter, then locate a subsection under the chapter, and then find the page number. Similar examples include: looking up a dictionary, checking train schedules, flight schedules, etc.

The essence is: to filter out the final desired result by continuously narrowing the scope of the data to be obtained, and at the same time turn random events into sequential events. In other words, with this indexing mechanism, we can always use the same search method to lock the data.

The same is true for databases, but it is obviously much more complicated because it not only faces equality queries, but also range queries (>、<、between、in) , fuzzy queries (like), union queries (or), and so on. How should the database choose to deal with all the problems? Let’s think back to the dictionary example. Can we divide the data into segments and then query them in segments? The simplest way is to divide 1,000 pieces of data into the first section with numbers 1 to 100, the second section with numbers 101 to 200, and the third section with numbers 201 to 300. Then, to check the 250th piece of data, you only need to find the third section, thus eliminating 90% of the invalid data at once. But what if there are 10 million records, how many segments should they be divided into? Students who have a basic knowledge of algorithms will think of the search tree, which has an average complexity of lgN and has good query performance. But here we have overlooked a key issue, the complexity model is based on the cost of the same operation each time. The database implementation is relatively complex. On the one hand, the data is stored on the disk. On the other hand, in order to improve performance, part of the data can be read into the memory for calculation each time. Because we know that the cost of accessing the disk is about 100,000 times that of accessing the memory, a simple search tree is difficult to meet complex application scenarios.

2. Disk IO and Pre-reading

Considering that disk IO is a very expensive operation, the computer operating system has made some optimizations. During an IO, not only the data at the current disk address, but also the adjacent data are read into the memory buffer . This is because the principle of local pre-reading tells us that when the computer accesses data at an address, the data adjacent to it will also be accessed quickly. The data read each time by IO is called a page. The specific size of a page depends on the operating system, which is usually 4k or 8k. That is, when we read data in a page, only one IO actually occurs. This theory is very helpful for the design of index data structure.

3. Index data structure

Any data structure does not come out of thin air. It must have its background and usage scenarios. Let's summarize what we need this data structure to do. In fact, it is very simple, that is: each time you look up data, control the number of disk IO times to a very small order of magnitude, preferably a constant order of magnitude. So we wonder if a highly controllable multi-way search tree can meet the needs? In this way, b+ tree came into being.

As shown in the figure above, it is a b+ tree. For the definition of b+ tree, please refer to B+ tree. Here we will only talk about some key points. The light blue block is called a disk block. You can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow). For example, disk block 1 contains data items 17 and 35, and contains pointers P1, P2, and P3. P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. The real data exists in the leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, and 99. Non-leaf nodes do not store real data, but only store data items that guide the search direction. For example, 17 and 35 do not actually exist in the data table.

###b+ tree search process

As shown in the figure, if you want to find data item 29, disk block 1 will be loaded from the disk to the memory first. At this time, an IO occurs. A binary search is used in the memory to determine that 29 is between 17 and 35. The P2 pointer of disk block 1 is locked. The memory time is very short (compared to the disk IO) and can be ignored. Disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of disk block 1. The second IO occurs. 29 is between 26 and 30. The P2 pointer of disk block 3 is locked. Disk block 8 is loaded into the memory through the pointer. The third IO occurs. At the same time, a binary search is performed in the memory to find 29, and the query ends. A total of three IOs are performed. The reality is that a 3-layer b+ tree can represent millions of data. If searching millions of data only requires three IOs, the performance improvement will be huge. If there is no index, each data item will require an IO, then a total of millions of IOs will be required, which is obviously very costly.

###b+Tree Properties

1. The index field should be as small as possible : Through the above analysis, we know that the number of IO times depends on the height h of b+number. Assuming that the data in the current data table is N, and the number of data items in each disk block is m, then h=㏒(m+1)N. When the data volume N is constant, the larger m is, the smaller h is; and m = disk block size/data item size. The disk block size is the size of a data page, which is fixed. If the space occupied by the data item is smaller and the number of data items is greater, the height of the tree is lower. This is why each data item, i.e. the index field, should be as small as possible. For example, int occupies 4 bytes, which is half of bigint 8 bytes. This is why the b+ tree requires that the real data be placed in the leaf nodes rather than the inner nodes. Once placed in the inner nodes, the data items of the disk blocks will drop significantly, causing the tree to increase in height. When the data item is equal to 1, it will degenerate into a linear list.

2. The leftmost matching feature of the index (i.e. matching from left to right) : When the data item of the b+ tree is a composite data structure, such as (name, age, sex), the b+ tree builds the search tree in order from left to right. For example, when data such as (Zhang San, 20, F) is retrieved, the b+ tree will first compare the name to determine the next search direction. If the names are the same, then age and sex are compared in turn to finally get the retrieved data. However, when data without a name such as (20, F) comes, the b+ tree does not know which node to check next, because name is the first comparison factor when building the search tree, and it is necessary to search based on the name first to know where to query next. For example, when retrieving data like (Zhang San, F), the b+ tree can use name to specify the search direction, but the next field age is missing, so it can only find the data with the name equal to Zhang San, and then match the data with the gender of F. This is a very important property, namely the leftmost matching feature of the index.

4. Mysql index management

1. Function

#1. The function of index is to speed up search

#2. Primary key, unique, and joint unique in MySQL are also indexes. In addition to speeding up searches, these indexes also have constraints.

2. MySQL index classification

Index classification 1. Ordinary index index: accelerate search 2. Unique index primary key index: primary key: accelerate search + constraint (not empty and unique)
    Unique index: unique: speed up search + constraint (unique)
3. Combined index -primary key(id,name): Combined primary key index -unique(id,name): Combined unique index -index(id,name): Combined ordinary index 4. Full-text index fulltext: It is most effective when used to search for a very long article.
5. Spatial index: just understand it, almost never used
1 For example, you are making a membership card system for a shopping mall.
 2 
 3 This system has a member table 4 with the following fields:
 5 Member Number INT
 6 Member name VARCHAR(10)
 7 Member ID number VARCHAR(18)
 8 Member phone number VARCHAR(10)
 9 Member's address VARCHAR(50)
10 Member Remarks TEXT
11 
12 Then this membership number is used as the primary key, using PRIMARY
13 If you want to create an index for member names, then it is a normal INDEX
14 If you want to create an index for the member ID number, you can choose UNIQUE (unique, no duplication allowed)
15 
16 #In addition, there is a full-text index, namely FULLTEXT
17 Member notes information. If you need to create an index, you can choose full-text search.
18 works best when searching very long articles.
19 is used for shorter texts. If it is just one or two lines, a normal INDEX will also work.
20 But in fact, for full-text search, we will not use the index that comes with MySQL, but will choose third-party software such as Sphinx, which is specifically for full-text search.
twenty one 
22 #Other indexes such as spatial index SPATIAL, just understand them, there are almost no application scenarios where each index is not used

3. Two major types of indexes: hash and btree

#When creating the above index, we can specify the index type for it. There are two types of hash type indexes: fast single query and slow range query. Btree type index: B+ tree, the more layers, the exponential growth of data volume (we use it because InnoDB supports it by default)
#Different storage engines support different index types. InnoDB supports transactions, row-level locking, B-tree, Full-text and other indexes, but does not support Hash indexes.
MyISAM does not support transactions, but supports table-level locking, B-tree, Full-text and other indexes, but does not support Hash indexes;
Memory does not support transactions, but supports table-level locking, B-tree, Hash and other indexes, but does not support Full-text indexes;
NDB supports transactions, row-level locking, and hash indexes, but does not support B-tree, Full-text, and other indexes.
Archive does not support transactions, but supports table-level locking. It does not support B-tree, Hash, Full-text, and other indexes.

4. Syntax for creating/deleting indexes

1 #Method 1: When creating a table 2 CREATE TABLE table name (
 3 Field Name 1 Data Type [Integrity Constraints…],
 4 Field Name 2 Data Type [Integrity Constraints…],
 5 [UNIQUE | FULLTEXT | SPATIAL ] INDEX | KEY
 6 [index name] (field name [(length)] [ASC | DESC]) 
 7 );
 8 
 9 
10 #Method 2: CREATE creates an index on an existing table 11 CREATE [UNIQUE | FULLTEXT | SPATIAL ] INDEX index name 12 ON table name (field name [(length)] [ASC | DESC]);
13 
14 
15 #Method 3: ALTER TABLE Create an index on an existing table 16 ALTER TABLE table name ADD [UNIQUE | FULLTEXT | SPATIAL ] INDEX
17 index name (field name [(length)] [ASC | DESC]);
18                              
19 #Delete index: DROP INDEX index name ON table name;
Syntax for creating/deleting an index

Syntax for creating/deleting an index

Make good use of the help document help create
help create index
==================
1. Create an index - create it when creating the table (a few points to note)
    create table s1(
    id int, #You can add primary key here
    #id int index #You cannot add an index like this, because index is just an index, not a constraint.
    #You cannot add an index name char(20) when defining a field, like a primary key or unique constraint.
    age int,
    email varchar(30)
    #primary key(id) #You can also add index(id) #You can add it like this);
    -After creating the table, create index name on s1(name); #Add a common indexcreate unique age on s1(age);Add a unique indexalter table s1 add primary key(id); #Add a housing and construction index, that is, add a primary key constraint to the id fieldcreate index name on s1(id,name); #Add a common joint index2. Delete the indexdrop index id on s1;
    drop index name on s1; #Delete a common indexdrop index age on s1; #Delete a unique index, just like a common index, you don't need to add unique in front of index to delete it, you can delete it directlyalter table s1 drop primary key; #Delete the primary key (because it was added according to alter, so we also use alter to delete it)

Help View

5. Test Index

1. Preparation

#1. Prepare table create table s1(
id int,
name varchar(20),
gender char(6),
email varchar(50)
);
#2. Create a stored procedure to insert records in batches delimiter $$ #Declare the end symbol of the stored procedure to be $$
create procedure auto_insert1()
BEGIN
    declare i int default 1;
    while(i<3000000)do
        insert into s1 values(i,concat('egon',i),'male',concat('egon',i,'@oldboy'));
        set i=i+1;
    end while;
END$$ #$$end delimiter; #Re-declare the semicolon as the end symbol#3. View the stored procedure show create procedure auto_insert1\G 
#4. Call the stored procedure call auto_insert1();

2. Test query speed without index

#No index: Scan from beginning to end, so the query speed is very slow mysql> select * from s1 where id=333;
+------+---------+--------+----------------+
| id | name | gender | email |
+------+---------+--------+----------------+
| 333 | egon333 | male | [email protected] |
| 333 | egon333 | f | alex333@oldboy |
| 333 | egon333 | f | alex333@oldboy |
+------+---------+--------+----------------+
rows in set (0.32 sec)
mysql> select * from s1 where email='egon333@oldboy';
....
... rows in set (0.36 sec)

3. Add index

#1. Be sure to create an index for the search condition field, for example, select * from t1 where age > 5; then you need to add an index for age. #2. If there is already a large amount of data in the table, creating an index will be very slow and take up hard disk space. Insertion, deletion and update are all very slow. Only query is fast. For example, create index idx on s1(id); will scan all the data in the table, and then use id as the data item to create an index structure and store it in the table on the hard disk.
After the creation, the query will be very fast #3. It should be noted that the index of the innodb table will be stored in the s1.ibd file, while the index of the myisam table will have a separate index file table1.MYI

6. Use indexes correctly

1. Covering Index

#Analysis select * from s1 where id=123;
The SQL hits the index but does not cover it.
Use id=123 to locate the location of the id in the hard disk, or in the data table, in the index data structure.
However, the field we selected is *, and we need other fields besides id, which means that it is not enough to get the id through the index structure.
We also need to use the id to find the other field values ​​of the row where the id is located, which takes time. Obviously, if we only select the id,
This trouble is eliminated, as follows: select id from s1 where id=123;
This is a covering index. The index is hit, and the address of the id on the hard disk is directly obtained from the index data structure. The speed is very fast.

2. Joint Index

3. Index Merge

#Index merging: Merge multiple single-column indexes and use #Analysis:
We can use index merging to solve everything that combined indexes can do, such as create index ne on s1(name,email);#Combined index We can create indexes for name and email separately. Combined indexes can hit:
select * from s1 where name='egon';
select * from s1 where name='egon' and email='adf';
Index merge can hit:
select * from s1 where name='egon';
select * from s1 where email='adf';
select * from s1 where name='egon' and email='adf';
At first glance, it seems that index merging is better: it can hit more cases, but in fact it depends on the case. If name='egon' and email='adf',
Then the efficiency of combined index is higher than that of index merge. If it is a single condition query, it is more reasonable to use index merge.

If we want to use indexes to achieve the desired effect of improving query speed, we must follow the following principles when adding indexes:

#1. The leftmost prefix matching principle is a very important principle.
create index ix_name_email on s1(name,email,)
- Leftmost prefix match: must match from left to right select * from s1 where name='egon'; #ok select * from s1 where name='egon' and email='asdf'; #ok select * from s1 where email='[email protected]'; #No MySQL will keep matching to the right until it encounters a range query (>, <, between, like) and stops matching.
For example, if a = 1 and b = 2 and c > 3 and d = 4, then if we create an index in the order (a, b, c, d),
There is no need for an index for d. If you create an index of (a,b,d,c), it can all be used. The order of a,b,d can be adjusted arbitrarily.
#2.= and in can be in any order, for example, a = 1 and b = 2 and c = 3. You can create an (a,b,c) index in any order. The MySQL query optimizer will help you optimize it into a form that the index can recognize. #3. Try to choose a column with high discrimination as the index. The formula for discrimination is count(distinct col)/count(*).
Indicates the proportion of fields that are not repeated. The larger the proportion, the fewer records we scan. The unique key has a discrimination of 1, while some states,
The gender field may have a discrimination value of 0 in the face of big data, so someone may ask, is there any empirical value for this ratio? Different usage scenarios,
This value is also difficult to determine. Generally, we require the fields that need to be joined to be above 0.1, that is, an average of 10 records are scanned for 1 record. #4. Index columns cannot be used in calculations. Keep the columns "clean", such as from_unixtime(create_time) = '2014-05-29'
The index cannot be used. The reason is very simple. The B+ tree stores the field values ​​in the data table.
However, when searching, the function needs to be applied to all elements for comparison, which is obviously too costly.
So the statement should be written as create_time = unix_timestamp('2014-05-29');

Leftmost prefix demonstration

mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.39 sec)
mysql> create index idx on s1(id,name,email,gender); #Leftmost prefix not followed Query OK, 0 rows affected (15.27 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.43 sec)

mysql> drop index idx on s1;
Query OK, 0 rows affected (0.16 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> create index idx on s1(name,email,gender,id); #Follow the leftmost prefix Query OK, 0 rows affected (15.97 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> select * from s1 where id>3 and name='egon' and email='[email protected]' and gender='male';
Empty set (0.03 sec)
1 6. Leftmost prefix matches 2 index(id,age,email,name)
 3 #id must appear in the condition (as long as id appears, the speed will be improved)
 4 id
 5 id age
 6 id email
 7 id name
 8 
 9 email #No, if this is the beginning, the speed will not be improved. 10 mysql> select count(*) from s1 where id=3000;
11 +---------+
12 | count(*) |
13 +---------+
14 | 1 |
15 +----------+
16 1 row in set (0.11 sec)
17 
18 mysql> create index xxx on s1(id,name,age,email);
19 Query OK, 0 rows affected (6.44 sec)
20 Records: 0 Duplicates: 0 Warnings: 0
twenty one 
22 mysql> select count(*) from s1 where id=3000;
23 +---------+
24 | count(*) |
25 +---------+
26 | 1 |
27 +----------+
28 1 row in set (0.00 sec)
29 
30 mysql> select count(*) from s1 where name='egon';
31 +---------+
32 | count(*) |
33 +---------+
34 | 299999 |
35 +---------+
36 1 row in set (0.16 sec)
37 
38 mysql> select count(*) from s1 where email='[email protected]';
39 +---------+
40 | count(*) |
41 +---------+
42 | 1 |
43 +---------+
44 1 row in set (0.15 sec)
45 
46 mysql> select count(*) from s1 where id=1000 and email='[email protected]';
47 +----------+
48 | count(*) |
49 +---------+
50 | 0 |
51 +----------+
52 1 row in set (0.00 sec)
53 
54 mysql> select count(*) from s1 where email='[email protected]' and id=3000;
55 +---------+
56 | count(*) |
57 +---------+
58 | 0 |
59 +----------+
60 1 row in set (0.00 sec)
Create a joint index, leftmost match

When the index cannot be hit, please note:

- like '%xx'
    select * from tb1 where email like '%cn';
    
- Use function select * from tb1 where reverse(email) = 'wupeiqi';
    
- or
    select * from tb1 where nid = 1 or name = '[email protected]';
    
    Special: It will be invalid only when there is a column that is not indexed in the or condition. The following will use the index select * from tb1 where nid = 1 or name = 'seven';
            select * from tb1 where nid = 1 or name = '[email protected]' and email = 'alex'
            
- Type inconsistency If the column is a string type, the input condition must be enclosed in quotation marks, otherwise...
    select * from tb1 where email = 999;
Normal index does not mean that the index will not be used - !=
    select * from tb1 where email != 'alex'
    Special: If it is a primary key, the index will still be used select * from tb1 where nid != 123
->
    select * from tb1 where email > 'alex'
    
    Special: If the primary key or index is an integer type, the index will still be used select * from tb1 where nid > 123
        select * from tb1 where num > 123
        
#If the sorting condition is an index, the select field must also be an index field, otherwise it will not be hit - order by
    select name from s1 order by email desc;
    When sorting by index, if the selected query field is not an index, the index will not be used. select email from s1 order by email desc;
    Special: If the primary key is sorted, the index is still used:
        select * from tb1 order by nid desc;
- The leftmost prefix of the combined index If the combined index is: (name, email)
    name and email -- use index name -- use index email -- do not use index - count(1) or count(column) instead of count(*) makes no difference in mysql - create index xxxx on tb(title(19)) #text type, length must be specified
- Avoid using select *
- count(1) or count(column) instead of count(*)
- When creating a table, try to use char instead of varchar
- The order of the fields in the table is fixed-length fields first - Composite indexes instead of multiple single-column indexes (when multiple conditions are often used for query)
- Use short indexes whenever possible - Use JOINs instead of sub-queries
- When joining tables, make sure the condition type is consistent - Index hash values ​​(with few duplicates) are not suitable for indexing, for example: gender is not suitable

7. Basic steps for slow query optimization

0. Run first to see if it is really slow, and set SQL_NO_CACHE
1. Where condition single table query, lock the minimum return record table. This sentence means to apply the where clause of the query statement to the table with the smallest number of records returned, and start the query. Query each field of the table separately to see which field has the highest discrimination. 2. Explain to check whether the execution plan is consistent with the expectation in 1 (start the query from the table with fewer locked records).
3. Use the SQL statement of the form of order by limit to prioritize the sorted table. 4. Understand the usage scenarios of the business side. 5. Refer to the major principles of index creation when adding indexes. 6. Observe the results. If they do not meet expectations, continue to analyze from 0.

Summarize

This article ends here. I hope it can be helpful to you. I also hope you can pay more attention to more content on 123WORDPRESS.COM!

You may also be interested in:
  • Summary of some practical methods for Mysql query optimization
  • SQL Server query statement blocking optimization performance
  • MySQL slow query optimization solution
  • Detailed Analysis of Optimization Ideas for MySQL Large Data Query
  • A 20-second SQL slow query optimization solution
  • Detailed example of locating and optimizing slow query sql in MySQL
  • SQL uses composite indexes to optimize database queries

<<:  A brief analysis of the usage of HTML float

>>:  Docker custom network implementation

Recommend

Detailed explanation of the binlog log analysis tool for monitoring MySQL: Canal

Canal is an open source project under Alibaba, de...

JS operation object array to achieve add, delete, modify and query example code

1. Introduction Recently, I helped a friend to ma...

Related commands to completely uninstall nginx under ubuntu16.04

nginx Overview nginx is a free, open source, high...

A brief analysis of the game kimono memo problem

Today, after the game was restarted, I found that...

MySQL database must know sql statements (enhanced version)

This is an enhanced version. The questions and SQ...

Introduction to the role of HTML doctype

Document mode has the following two functions: 1. ...

Detailed steps to install mysql5.7.18 on Mac

1. Tools We need two tools now: MySQL server (mys...

How to set up the terminal to run applications after Ubuntu starts

1. Enter start in the menu bar and click startup ...

Uncommon but useful tags in Xhtml

Xhtml has many tags that are not commonly used but...

Docker Nginx container production and deployment implementation method

Quick Start 1. Find the nginx image on Docker Hub...

HTML+CSS to achieve charging water drop fusion special effects code

Table of contents Preface: accomplish: Summarize:...

A brief discussion on CSS blocking merging and other effects

Non-orthogonal margins When margin is used, it wi...