1. Index Basics1.1 IntroductionIn MySQL, an index is also called a "key". It is a data structure used by the storage engine to quickly find records. Indexes are critical to good performance, especially as the amount of data in a table grows, the impact of indexes on performance becomes increasingly important. Index optimization should be the most effective means to optimize query performance. Creating a truly optimal index often requires rewriting SQL query statements. 1.2 How Indexing WorksThe easiest way to understand how indexes work in MySQL is to take a look at the index of a book. For example, if you want to find a topic in a book, you usually look at the index first. Once you find the corresponding chapter and page number, you can quickly find what you want. In MySQL, the storage engine uses indexes in a similar way. It first searches for the corresponding value in the index, then finds the corresponding data row based on the matching index record, and finally returns the data result set to the client. 1.3 Types of IndexesIn MySQL, the index types we usually refer to are as follows:
1.4 Indexing MethodsIn MySQL, indexes are implemented at the storage engine level, not at the server level. The indexing methods supported by MySQL can also be called index types (in a broad sense), mainly the following: B-Tree Index If the type is not specified, it is probably referring to a B-Tree index. Different storage engines use B-Tree indexes in different ways, and the performance varies. For example: MyISAM uses prefix compression technology to make the index smaller, but InnoDB stores the index in the original data format. For example, MyISAM references indexed rows by the physical location of the data, while InnoDB references indexed rows based on the primary key. B-Tree stores index columns sequentially, so it is very suitable for searching range data. It can speed up access to data because the storage engine no longer needs to perform a full table scan to obtain the required data. If an index includes values from multiple fields (columns), it is a composite index. A composite index orders multiple field values based on the order in which the columns are created. as follows: create table people ( id int unsigned not null auto_increment primary key comment 'Primary key id', last_name varchar(20) not null default '' comment 'surname', first_name varchar(20) not null default '' comment 'Name', birthday date not null default '1970-01-01' comment 'Date of Birth', gender tinyint unsigned not null default 3 comment 'Gender: 1 male, 2 female, 3 unknown', key(last_name, first_name, birthday) ) engine=innodb default charset=utf8; The following data has also been inserted into the people table:
We created a composite index key (last_name, first_name, birthday), which contains the values of the last name, first name, and birthday columns for each row in the table. The index is also sorted and stored according to this order. If two people have the same first and last name, the index will be sorted and stored according to their date of birth. B-Tree indexes are applicable to full key value, key value range, or key prefix search, where key prefix search is only applicable to search based on the leftmost prefix. Composite indexes are effective for the following types of queries: Full value match Full value matching refers to matching all columns in the index. For example: Find people whose last name is Allen, first name is Cuba, and whose date of birth is 1960-01-01. The SQL statement is: select id,last_name,first_name,birthday from people where last_name='Allen' and first_name='Cuba' and birthday='1960-01-01'; . Matches the leftmost prefix For example, we can use only the first column of the index to find all people with the last name Allen. The SQL statement is: select id,last_name,first_name,birthday from people where last_name='Allen'; Matching column prefix For example, you can match only the beginning of the value in the first column of the index to find all people whose last names begin with A. The SQL statement is: select id,last_name,first_name,birthday from people where last_name like 'A%'; Matching range value For example, the range matches people with last names between Allen and Clinton. The SQL statement is: select id,last_name,first_name,birthday from people where last_name BETWEEN 'Allen' And 'Clinton'; Here again only the first column of the index is used. Exactly match the first column and range match the following columns For example, find people whose last name is Allen and whose first name starts with the letter C. That is, the first column of the composite index is fully matched, and the second column is range matched. The SQL statement is: select id,last_name,first_name,birthday from people where last_name = 'Allen' and first_name like'C%'; Queries that access only the index B-Tree can usually support "index-only queries", that is, queries only need to access the index without accessing the data rows. This is related to the optimization of "covering index", which will be discussed later. Here are some situations where composite indexes fail: (1) If the search does not start from the leftmost column of the composite index, the index cannot be used. For example, in the example above, the index cannot be used to find a person named Cuba, nor can it be used to find a person with a specific birth date, because neither of these two columns is the leftmost column of the composite index key (last_name, first_name, birthday). Similarly, you cannot search for people whose last names end with a certain letter, that is, the fuzzy match operator % in a like range query will invalidate the index if it is placed first. (2) If a column in the index is skipped during a search, only the first index column will be used and the following index columns will become invalid. For example, find people with the last name Allen who were born on a specific date. When searching here, since the search name (first_name) is not specified, MySQL can only use the first column of the composite index (that is, last_name). (3) If a query contains a range query on a certain column, all columns to the right of this column cannot use index optimization to find them. For example, if the query condition is where last_name='Allen' and first_name like 'C%' and birthday='1992-10-25', this query can only use the first two columns of the index because the like here is a range condition. If the number of values in the range query column is limited, you can optimize it by using multiple equal conditions instead of range conditions so that the right column can also use the index. Now that we know how important the order of columns in a composite index is, these restrictions are all related to the order of the index columns. When optimizing performance, you may need to use indexes with the same columns but in different orders to meet different types of query requirements. For example, in a table, you may need two composite indexes: key(last_name, first_name, birthday) and key(first_name, last_name, birthday). B-Tree index is the most commonly used index type. In the following, unless otherwise specified, it refers to B-Tree index. 1. Hash index A hash index is implemented based on a hash table. Only queries that exactly match all columns of the index are valid. In MySQL, only the Memory engine explicitly supports hash indexes. 2. Spatial data index (R-Tree) The MyISAM engine supports spatial indexes and can be used as geographic data storage. Unlike B-Tree indexes, this index does not require prefix queries. 3. Full-text indexing A full-text index is a special type of index that looks for keywords in the text rather than directly comparing values in the index. The matching method of full-text index is completely different from that of other indexes. It is more similar to what a search engine does, rather than a simple where condition match. You can create a full-text index and a B-Tree index on the same column at the same time. The full-text index is suitable for Match Against operations rather than ordinary where condition operations. An index can contain the values of one column (i.e., field) or multiple columns. If the index contains multiple columns, it is generally called a composite index. At this time, the order of the columns is very important because MySQL can only efficiently use the leftmost prefix columns of the index. Creating one index containing two columns is very different from creating two indexes containing only one column. 1.5 Advantages of IndexesIndexes allow MySQL to quickly find the data we need, but this is not the only function of indexes. The most common B-Tree index stores data in order, so MySQL can be used for Order By and Group By operations. Because the data is stored in order, B-Tree will store related column values together. Finally, because the actual column values are also stored in the index, some queries can retrieve all the data using only the index without having to go back to the table for query. Based on this feature, we can conclude that indexes have the following three advantages:
In addition, some people use the "three-star system" to evaluate whether an index is suitable for a query statement. The three-star system mainly means: if the index can put related records together, it gets one star; if the order of data in the index is consistent with the arrangement order in the search, it gets two stars; if the columns in the index contain all the columns required for the query, it gets three stars. Indexes are not always the best tool, nor is the more indexes the better. In general, an index is useful only when the benefit of helping the storage engine find records quickly outweighs the extra work it causes. For very small tables, a simple full table scan is more efficient in most cases and there is no need to create an index. For medium to large tables, the benefits of indexes are very obvious. 2. High-performance indexing strategyCorrectly creating and using indexes is the basis for high-performance queries. Previously, we have introduced various types of indexes and their advantages and disadvantages. Now let's take a look at how to truly leverage the advantages of these indexes. The following sections will help you understand how to use indexes efficiently. 2.1 Independent columnsWe often see queries that use indexes inappropriately or that prevent MySQL from using existing indexes. If the columns in the SQL query are not independent, MySQL will not use the index. "Independent columns" means that the index column cannot be part of an expression or a function parameter. For example, the following SQL query statement cannot use the primary key index id: select id,last_name,first_name,birthday from people where id+1=3; It is easy to see that the above where expression can actually be shortened to where id=2, but MySQL cannot automatically parse this expression. We should develop the habit of simplifying the where conditions and always put the index column alone on one side of the comparison operator. So if you want to use the primary key index, the correct way to write it is: select id,last_name,first_name,birthday from people where id=2; Here is another common mistake: select ... from ... where to_days(current_date()) - to_days(date_col) <= 10; 2.2 Prefix Index and Index SelectivitySometimes, we need to index very long character columns, which makes the index large and slow. The usual solution is to index only the first few characters of the column, which can greatly save index space and thus improve index efficiency. However, it also reduces the selectivity of the index. The selectivity of an index refers to the ratio of the number of unique index values (also called cardinality) to the total number of records in the data table, ranging from 0 to 1. The selectivity of a unique index is 1, which is the best index selectivity and has the best performance. Generally speaking, the selectivity of a column prefix is high enough to meet query performance requirements. For columns of Blob, Text, or very long Varchar types, you must use a prefix index, that is, index only the first few characters of the column, because MySQL does not allow the full length of these columns to be indexed. To add a prefix index: alter table user add key(address(8)); // Only index the first 8 characters of the address field Prefix index is an effective way to make the index smaller and faster, but the disadvantage is that MySQL cannot use prefix index for Order By and Group By operations, nor can it use prefix index for covering scan. Sometimes a suffix index is useful, for example to find all email addresses for a domain. However, MySQL does not natively support suffix indexes. We can store the string in reverse order and create a prefix index based on it, and then maintain this index through triggers. 2.3 Multi-column indexA multi-column index is an index that contains multiple columns. You must pay attention to the order of the columns. A multi-column index is also called a composite index. For example, the previous key (last_name, first_name, birthday) is a composite index. A common mistake is to create a separate index for each column, or to create indexes on multiple columns in the wrong order. Let's look at the first problem first. Create a separate index for each column. It is easy to see this situation from show create table: create table t ( c1 int, c2 int, c3 int, key(c1), key(c2), key(c3) ); This incorrect indexing strategy is usually caused by people hearing vague advice from experts such as "index all columns in the where condition". Creating independent single-column indexes on multiple columns does not improve MySQL query performance in most cases. In MySQL 5.0 and later versions, a strategy called index merge is introduced, which can use multiple single-column indexes on the table to locate the specified row to a certain extent. But the efficiency is still much worse than the composite index. For example, the table film_actor has a single-column index on the fields film_id and actor_id. The SQL query statement is as follows: select film_id,actor_id from film_actor where actor_id=1 or film_id=1; In MySQL 5.0 and later versions, queries can use these two single-column indexes to scan at the same time and merge the results. This algorithm has three variants: union with or conditions, intersection with and conditions, and union and intersection that combines the first two conditions. The query above uses a union of two index scans, which can be seen from the Extra column in explain (the union character appears in the Extra value): explain select film_id,actor_id from film_actor where actor_id=1 or film_id=1\G Index merge strategy is sometimes the result of optimization, but more often it indicates that the index on the table is poorly built:
select film_id,actor_id from film_actor where actor_id=1 union all select film_id,actor_id from film_actor where film_id=1 and actor_id<>1; If you find an index union in the explain result, you should carefully check the SQL query statement and table structure to see if it is optimal, whether it can be split into multiple query union methods, etc. 2.4 Choose the appropriate index column orderThe most confusing thing is the order of columns in a composite index. In a composite index, the correct column order depends on the queries that use the index, and also needs to be considered to best meet sorting and grouping needs. The order of index columns means that the index is sorted by the leftmost column first, then the second column, the third column, and so on. Therefore, the index can be scanned in ascending or descending order to meet the query requirements of clauses such as order by, group by, and distinct that exactly match the column order. When sorting and grouping are not a concern, it is often a good idea to place the most selective columns at the leftmost (first) end of the composite index. At this time, the index is only used to optimize the search of the where condition. However, we may also need to adjust the order of the index columns based on the most frequently run queries to make the index most selective in this case. Take the following query as an example: select * from payment where staff_id=2 and customer_id=500; Should I create an index with key(staff_id, customer_id) or key(customer_id, staff_id)? You can run some queries to determine the distribution of values in the table and determine which column is more selective. For example, you can use the following query to predict: select sum(staff_id=2), sum(customer_id=500) from payment\G Suppose, the result shows that the value of sum(staff_id=2) is 7000, and the value of sum(customer_id=500) is 60. From this we can see that in the above query, customer_id is more selective and should be placed at the front of the index, that is, using key(customer_id, staff_id). However, there is one thing to note when doing this; the results of the query are very dependent on the specific values selected. If you optimize using the above method, it may be unfair to queries with different condition values and may also cause the overall performance of the server to become worse. If the "worst queries" are extracted from the report of a tool such as pt-query-digest, the index order selected according to the above method is often very efficient. If there is no similar specific query to run, it is best to follow the rules of thumb, because the rules of thumb consider the global cardinality and selectivity, rather than the query of a specific condition value. By rule of thumb, selectivity can be determined as follows: select count(distinct staff_id)/count(*) as staff_id_selectivity, count(distinct customer_id)/count(*) as customer_id_selectivity, from payment\G Suppose, the result shows that the value of staff_id_selectivity is 0.001, while the value of customer_id_selectivity is 0.086. We know that the larger the value, the higher the selectivity. Therefore, customer_id has higher selectivity. Therefore, we put it as the first column of the index: alter table payment add key(customer_id, staff_id); Although the rules of thumb about selectivity and global cardinality are worth studying and analyzing, be sure to remember the impact of factors such as order by and group by, which can have a significant impact on query performance. 2.5 Clustered IndexA clustered index is not a separate index type, but a way of storing data. The exact details depend on how it is implemented, but InnoDB's clustered index actually stores the B-Tree index and the data rows in the same structure. When there is a clustered index in a table, its data rows are actually stored in the leaf pages of the index. That is, the leaf pages contain all the data of the rows, while the node pages only contain the data of the index columns. Because the storage engine is responsible for implementing the index, not all storage engines support clustered indexes. In this section, we focus on InnoDB, but the content discussed here is applicable to any storage engine that supports clustered indexes. InnoDB clusters data by primary key. If no primary key is defined, InnoDB will choose a unique non-empty index instead. If there is no such index, InnoDB implicitly defines a primary key to serve as the clustered index. Advantages of clustered indexes:
If you can take full advantage of the above advantages when designing tables and queries, you can greatly improve performance. Disadvantages of clustered index:
In InnoDB, the clustered index "is" the table, so it does not require separate row storage like MyISAM. Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transactions and MVCC (multi-version control), and all remaining columns. InnoDB's secondary index (non-clustered index) is very different from the clustered index. The leaf nodes of the secondary index do not store "row pointers" but primary key values. Therefore, when searching for data through a secondary index, two index searches are performed. The storage engine needs to first search the leaf node of the secondary index to obtain the corresponding primary key value, and then search the corresponding data row in the clustered index based on this primary key value. To ensure that data rows are inserted in order, the easiest way is to define the primary key as auto_increment. When using InnoDB, you should try to insert data in primary key order whenever possible, and try to insert new rows using monotonically increasing primary key values whenever possible. For highly concurrent workloads, inserting in primary key order in InnoDB may cause significant primary key value contention issues. This problem is very serious and you can solve it yourself by searching on Baidu. 2.6 Covering IndexUsually people will create appropriate indexes based on the where conditions of the query, but this is only one aspect of index optimization. To design an excellent index, you should consider the entire query, not just the where condition. Indexes are indeed an efficient way to find data, but MySQL can also use indexes to directly obtain column data without having to read the data row. If the leaf nodes of the index already contain all the data to be queried, then why is it necessary to go back to the table for query? If an index contains (or covers) the values of all fields (columns) that need to be queried, we call it a "covering index". Covering indexes are very useful and can greatly improve performance. Consider how much benefit would be gained if the query only had to scan the index instead of going back to the table to fetch the rows:
In all of these scenarios, the cost of doing the entire query in the index is generally much less than going back to the table. B-Tree indexes can be covering indexes, but hash indexes, spatial indexes, and full-text indexes do not support covering indexes. When you initiate a query that is covered by an index (also called an index covering query), you can see the "Using index" information in the Extra column of explain. like: explain select id from people; explain select last_name from people; explain select id,first_name from people; explain select last_name,first_name,birthday from people; explain select last_name,first_name,birthday from people where last_name='Allen'; The people table is what we created in the section above. It contains a primary key (id) index and a multi-column composite index key (last_name, first_name, birthday). These two indexes cover the values of four fields. If a SQL query statement contains all the fields to be queried within these four fields, then this query can be called an index coverage query. If an index contains the values of all fields to be queried in a SQL query statement, this index is a covering index for the query statement. For example, key(last_name, first_name, birthday) is a covering index for select last_name,first_name from people. 2.7 Using Index Scan to SortingMySQL has two ways to generate ordered result sets: through a sort operation (order by) and automatic sorting by index order scan (that is, sorting by index). In fact, these two sorting operations do not conflict, which means that order by can use the index for sorting. To be precise, MySQL sorts result sets in two ways: 1. Index sorting Index sorting means sorting the result set using the field values in the index. If the value of the type parameter in explain is index, it means that MySQL must use index sorting. like: explain select id from people; explain select id,last_name from people order by id desc; explain select last_name from people; explain select last_name from people order by last_name; explain select last_name from people order by last_name desc; Note: Even if the value of type in explain is not index, it may be sorted by index. like: explain select id from people where id >3; explain select id,last_name from people where id >3 order by id desc; 2. File sorting File sorting refers to sorting the query result set through additional operations and then returning it to the client. This sorting method does not use index sorting and is less efficient. Although file sorting, which MySQL calls filesort, does not necessarily use disk files. If the value of the Extra parameter in explain contains the string "Using filesort", it means file sorting is in progress. At this point, you must optimize the index or SQL query statement. like: explain select id,last_name,first_name from people where id > 3 order by last_name; MySQL can use the same index to satisfy both lookups and queries. If possible, when designing the index, you should try to satisfy both operations as much as possible. Index sorting can be used only when the index columns include the fields in the where condition and the fields in order by, and the order of the columns in the index is consistent with the order of all the fields included in where + order by (note: order by comes after where). Now, let's optimize the above SQL statement to make use of index sorting. First, add a multi-column index. alter table people add key(id,last_name); You will find that if you only add key(id,last_name), you still cannot use index sorting. This is because the where + order by statement must also meet the leftmost prefix requirement of the index, and where id > 3 is a range condition, which will cause the subsequent order by last_name to be unable to use the index key(id,last_name). Secondly, change the order by last_name in the SQL statement to order by id,last_name. Note: If the SQL query statement is a join query that joins multiple tables, index sorting can be used only when all the fields in the order by are from the first table. Here are some situations where index sorting cannot be used: 1. If order by is sorted according to multiple fields, but the sorting directions of the multiple fields are inconsistent, that is, some fields are asc (ascending, ascending by default), and some fields are desc (descending). like: explain select * from people where last_name='Allen' order by first_name asc, birthday desc; 2. If the order by contains a field that is not in the index column. like: explain select * from people where last_name='Allen' order by first_name, gender; 3. If the first column of the index column is a range search condition. like: explain select * from people where last_name like 'A%' order by first_name; 4. For this situation, the SQL statement can be optimized as follows: explain select * from people where last_name like 'A%' order by last_name,first_name; 2.8 Redundant and Duplicate IndexesMySQL allows you to create multiple indexes on the same column (but the names of the indexes are different). Since MySQL needs to maintain duplicate indexes separately, and the optimizer also needs to analyze and consider them one by one when optimizing queries, duplicate indexes will affect performance. Duplicate indexes are indexes of the same type created on the same columns in the same order. Creating duplicate indexes should be avoided and deleted immediately upon discovery. Redundant indexes are different from duplicate indexes. If you create an index key(A, B) and then create an index key(A), it will be a redundant index. Because index (A) is just a prefix index of the previous index. Index (A, B) can also be used as Index (A). However, if you create another index (B,A), it is no longer a redundant index. Redundant indexes usually occur when adding new indexes to a table. For example, one might add a new index (A, B) rather than extending the existing index (A). Another case is to expand a secondary index (A) to (A, ID), where ID is the primary key. For InnoDB, the primary key column is already included in the secondary index by default, so this is also redundant. In most cases, redundant indexes are not needed. You should try to extend existing indexes instead of creating new ones. But sometimes, redundant indexes are needed for performance reasons, because extending an existing index will make it larger, which will affect the performance of other query statements that use the index. Be careful when extending indexes. Because the leaf nodes of the secondary index contain the primary key values, an index on column (A) is equivalent to an index on (A, ID). If someone used a query like where A=5 order by ID, the index (A) would be very useful. However, if you change Index (A) to Index (A, B), it actually becomes Index (A, B, ID). Then, the order by statement in the above query cannot use index sorting, but can only use file sorting. It is recommended to use the pt-upgrade tool from the Percona Toolbox to double-check planned index changes. Therefore, only expand an existing index when you are clear about all queries related to an index. Otherwise, creating a new index (making the original index redundant with the new index) is the safest method. 2.9 Unused IndexesThere may be some indexes in the MySQL server that will never be used. Such indexes are completely redundant and it is recommended to consider deleting them. However, it should be noted that the unique constraint function of the unique index may be that a unique index has never been used in a query, but it can be used to avoid duplicate data. You may also be interested in:
|
<<: Detailed explanation of type protection in TypeScript
>>: Common commands for deploying influxdb and mongo using docker
The configuration is very simple, but I have to c...
Docker underlying technology: The two core techno...
1. The first parameter props of the setUp functio...
1. Purpose Write a Flask application locally, pac...
Table of contents 1. Usage 1. Basic usage 2. The ...
Table of contents 1. Introduction 1. Component da...
If you are looking to monitor your system interac...
environment Linux 3.10.0-693.el7.x86_64 Docker ve...
Table of contents 1. Baidu Map API Access 2. Usin...
This article example shares the specific code of ...
Today, database operations are increasingly becom...
Because I want to write a web page myself, I am al...
This article introduces a tutorial about how to u...
Step 1: Use Notepad to open the "my.ini"...
This article example shares the specific code of ...