MySQL performance optimization: how to use indexes efficiently and correctly

MySQL performance optimization: how to use indexes efficiently and correctly

Practice is the only way to test the truth. This article only positions the overall use of indexes. You only need to read the entire article and combine it with specific examples, or recall the places where it has been used in the past, to have a comprehensive understanding of the whole and understand how the index works. When you use or optimize indexes in the future, you can start from these aspects to further deepen the correct and efficient use of indexes.

1. Index failure

Index failure is a common topic. As long as database optimization and the use of indexes are mentioned, a lot of scenarios where indexes fail, what cannot be used and what should not be used can be listed in one breath. I will not list them one by one here.

Index failure refers to the situation where an index is created for a field in a table, but the index becomes invalid due to improper SQL statement writing.

In SQL statements, if the index column is used as part of an expression or participates in functions or mathematical operations, the index will become invalid.

For example, the following query cannot use the index on the age column:

select id,name,age from t_user where age + 1 = 7;

It is easy to see that the expression in where is actually equivalent to age=8, but MySQL cannot automatically parse this expression. This is entirely user behavior.

(In the previous article, we know that MySQL first searches by value on the index, and then returns the data row corresponding to the index value. Once the index column is operated, the corresponding data row cannot be found correctly, so the whole table is scanned row by row for query comparison)

2. Prefix Index and Index Selectivity

Sometimes using a column with very long content as an index column will make the index very large and slow. If you must add an index to this column, the solution is the simulated hash index mentioned in the previous article.

Usually, the first few characters can be indexed, which can greatly save index space and improve index efficiency, but it will also reduce the selectivity of the index.

The selectivity of an index refers to the ratio of the number of unique index values ​​(also called cardinality) to the total number of records in the table data, T, ranging from 1/T to 1. The higher the selectivity of the index, the higher the query efficiency, because a highly selective index allows MySQL to filter out more rows when searching.

The selectivity of a unique index is 1, which is the best index selectivity and has the best performance.

For BLOB, TEXT or large VARCHAR type columns, when used as query conditions (in principle, such operations should be avoided, but sometimes it is necessary), the column must use a prefix index to improve query performance. Because MySQL does not allow indexing the full length of these columns.

3. Multi-column index

A multi-column index means creating an independent index for each column.

When optimizing SQL, some people will index all columns in the where condition in the hope of optimizing query performance. But in fact, such optimization is very wrong. In the best case, it can only be a "one-star" index, and its performance may be several data levels lower than the truly optimal index. Sometimes, if it is not possible to design a "three-star" index, it is better to ignore the where clause and focus on optimizing the order of the index columns, or create a fully covering index.

Three-star index: In the book Relational Database Index Design and the Optimizers written by Lahdenmaki and Leach, a "three-star system" is mentioned to evaluate whether an index is suitable for a query: the index gets "one star" if it puts related records together; if the order of data in the index is consistent with the sort order in the search, it gets "two stars"; if the columns in the index include all the columns required by the query, it gets "three stars".

Creating independent single-column indexes on multiple columns does not improve MySQL query performance in most cases. This is also the wrong approach.

MySQL 5.0 and later versions introduced an index merge strategy, which can use multiple single-column indexes on a table to locate a specified row to a certain extent. Earlier versions of MySQL could only use one of the single-column indexes, but in this case no single-column index was very effective.

Index merging strategies are sometimes a result of optimization, but more often they indicate that the indexes on the table are poorly constructed:

1) When there is an intersection operation on multiple indexes (usually with multiple AND conditions), it usually means that a multi-column index containing all the relevant columns is needed, rather than multiple independent single-column indexes.

2) When multiple indexes need to be combined (usually with multiple OR conditions), a large amount of CPU and memory resources are usually consumed in the algorithm's caching, sorting, and merging operations. Especially when some of the indexes are not very selective and need to merge large amounts of data returned by the scan.

3) The optimizer does not calculate these into the "query cost", the optimizer only cares about random page reads. This will cause the query cost to be "underestimated", resulting in the execution plan being worse than a direct full table scan. Doing so will not only consume more CPU and memory resources, but may also affect the concurrency of the query. However, if such a query is run individually, the impact on concurrency is often ignored.

If you see index merge in the execution plan EXPLAIN, you should check the query and table structure to see if they are optimal. You can also use the optimizer_switch parameter to turn off the index merge function, or use the IGNORE INDEX hint to let the optimizer ignore certain indexes.

For multi-column indexes, as long as the leftmost column is used in the query conditions, the index will generally not become invalid.

The following are some examples:

The table t_user creates a multi-column index (id, name), as follows:

mysql> show create table t_user;
+--------+---------------+
| Table | Create Table |
+--------+---------------+
| t_user | CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
KEY `idx` (`id`,`name`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 |
+--------+-----------------------------------------+
1 row in set

Query by id, as follows:

mysql> explain select * from t_user where id = 1;
+----+-------------+--------+------------+------+---------------+-----+---------+-------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+-----+---------+-------+------+------+------+
| 1 | SIMPLE | t_user | NULL | ref | idx | idx | 4 | const | 1 | 100 | NULL |
+----+-------------+--------+------------+------+---------------+-----+---------+-------+------+------+------+
1 row in set

From the type in the execution plan, we can see that the index is valid. However, if you query by name, the index will fail (full table scan), as follows:

mysql> explain select * from t_user where name = 'xcbeyond';
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t_user | NULL | ALL | NULL | NULL | NULL | NULL | 7 | 14.29 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set

4. Choose the appropriate index column order

Index column order is really important. The correct order depends on the query that uses the index, and also needs to consider how to better meet the sorting and grouping needs (only applies to B-Tree indexes, hash or other indexes store data not sequentially).

In a multi-column B-Tree index, the order of the index columns means that the index is sorted by the left-most column first. Therefore, the index can be scanned in ascending or descending order to meet the query requirements of clauses such as order by, group by, and distinct that conform to the column order.

So the order of columns in a multi-column index is crucial. A rule of thumb for choosing the order of columns in an index is to put the most selective index first. This experience is very useful in some scenarios, but it is usually not as important as avoiding random IO and sorting, and the problem needs to be considered more comprehensively.

When sorting and grouping do not need to be a concern, it is often good to put the most selective columns first. At this time, the role of the index is only to optimize the search of the where condition. In this case, the index designed in this way can indeed filter out the required rows as quickly as possible, and it is also more selective for queries that only use the prefix columns of the index in the where clause. However, performance does not only depend on the selectivity of all index columns, but also on the specific values ​​of the query conditions, that is, on the distribution of values ​​(the order of index columns needs to be adjusted according to the most frequently run queries to make the index columns most selective in this case).

5. Clustered Index

A clustered index is not a separate index type, but a data storage method that combines data storage and indexing together. When you find the index page, you find the data. The exact details depend on how it is implemented, but InnoDB 's clustered index actually stores the B-Tree index and the data rows in the same structure.

Non-clustered index: Data storage and index are stored separately, and the leaf nodes of the index structure point to the corresponding rows of the data. When data needs to be accessed (through the index), the index is searched directly in the memory, and then the corresponding data on the disk is found through the index. This is why the speed is slow when the index is not hit in the key buffer.

When a table has a clustered index, its data rows are actually stored in the leaf pages of the index. "Clustered" means that data rows and adjacent key values ​​are stored compactly together. Because it is impossible to store data rows in two different places at the same time, a table can only have one clustered index.

Clustered index settings:

The default is the primary key. If no primary key is defined, InnoDB chooses a unique, non-empty index instead. If there is no such index, InnoD implicitly defines a primary key as the clustered index. InnoDB only clusters records that are on the same page, even if pages containing adjacent key values ​​are far apart.

(Seeing this, if you are familiar with the B-Tree index structure, you will know why [key, data] is stored as a tuple in one node)

A clustered primary key can help performance, but it can also cause serious performance problems. Therefore, clustered indexes need to be considered carefully, especially when changing the storage engine of a table from InnoDB to another engine (or vice versa).

Advantages of clustered indexes:

  • Related data can be saved together. For example, when implementing an email mailbox, you can aggregate data based on user ID, so that you only need to read a small amount of data from disk to get all the emails for a user.
  • Faster data access. The clustered index puts both the index and the data in the same B-Tree, so getting data from a clustered index is faster than getting data from a non-clustered index.
  • Queries using covering index scans can use the primary key values ​​in the page nodes directly.

Disadvantages of clustered index:

  • It maximizes the performance of I/O-intensive applications, but if all the data is stored in memory, the order of access is not that important and the clustered index has no advantage.
  • Insertion speed is heavily dependent on insertion order. Inserting in primary key order is the fastest way to load data into an InnoDB table. However, if the data is not loaded in a gradual sequence, it is best to use OPTIMIZE TABLE to reorganize the table after the load is completed.
  • Updating clustered index columns is expensive. Because it forces InnoDB to move each updated row to a new location.
  • Tables based on clustered indexes may face the problem of "page splits" when new rows are inserted or when the primary key is updated, resulting in row movement. When the primary key value of a row requires that the row be inserted into a full page, the storage engine splits the page into two pages to accommodate the row. This is a page split operation, which means that the table takes up more disk space.
  • Clustered indexes can cause full table scans to slow down, especially when rows are sparse or when data is not stored contiguously due to page splits.
  • Secondary indexes (non-clustered indexes) may be larger than you think. Because the leaf nodes of the secondary index contain the primary key columns of the referenced row.
  • Secondary index access requires two index lookups instead of one.

6. Covering Index

Usually people will create appropriate indexes based on the where conditions of the query, but this is only one aspect of index optimization. A well-designed index should take into account the entire query, not just the where condition. Indexes are indeed an efficient way to find data, but MySQL can also use indexes to directly retrieve column data, eliminating the need to read data rows. If an index contains all the field values ​​that need to be queried, we call it a "covering index", that is, an index covers all columns of the where condition.

The benefits of covering indexes are as follows:

  • Index entries are usually much smaller than the size of a data row, so if only the index needs to be read, MySQL can greatly reduce the amount of data access. This is especially important for cached workloads, where a large portion of the response time is spent copying data. Covering indexes are also helpful for I/O-intensive applications because the index is smaller than the data and can be more easily put into memory.
  • Because the index is stored in order of column values, I/O-intensive range queries will require much less I/O than randomly reading each row of data from disk. For some storage engines, such as MyISAM and Percona XtraDB , it is even possible to make the index fully sorted using the POTIMIZE command, which allows simple range queries to use fully sorted index access.
  • Some storage engines, such as MyISAM , cache only indexes in memory. The data relies on the operating system to be cached, so accessing the data requires a system call. This can cause serious performance issues, especially in scenarios where system calls account for the largest cost in data access.
  • Covering indexes are particularly useful for InnoDB tables because of InnoDB 's clustered index. InnoDB's secondary index stores the primary key value of the row in the leaf node, so if the secondary primary key can cover the query, the secondary query of the primary key index can be avoided.

Not all types of indexes can be covering indexes. Covering indexes must store the index column, while hash indexes, spatial indexes, and full-text indexes do not store the values ​​of index columns, so MySQL can only use B-Tree to make covering indexes. In addition, different storage engines implement covering indexes in different ways, and not all engines support covering indexes.

7. Use index scan to sort

MySQL has two ways to generate ordered result sets: through a sort operation, or by scanning in index order. If the value of the type column in EXPLAIN index , it means that MySQL uses index scan to do the sorting.

Scanning the index itself is fast because only one has to move from one index record to the immediately next record. However, if the index does not cover all the columns required for the query, you will have to go back to the table to query the corresponding row every time you scan an index record. This is essentially random I/O, so reading data in index order is usually slower than a sequential full table scan, especially in I/O-intensive workloads.

MySQL can use the same index both for sorting and for finding rows. Therefore, if possible, the index should be designed to satisfy both situations at the same time, that is, the index column is used as the sorting column.

  • MySQL can use the index to sort the results only when the order of the index columns is exactly the same as the order of order by clause and the sorting direction of all columns is the same.
  • If the query needs to join multiple tables, the index can be used for sorting only when all the fields referenced by order by clause are from the first table. The restrictions of order by clause are the same as those of the search query: the leftmost prefix of the index must be met; otherwise, MySQL needs to perform sequential operations and cannot use index sorting.

8. Redundant and duplicate indexes

Duplicate indexes are indexes of the same type created on the same columns in the same order. Such creation of duplicate indexes should be avoided and removed immediately upon discovery.

for example:

create table test{
 id int not null primary key,
 a int not null,
 b int not null,
 unique(id)
 index(id)
}engine=InnoDB;

An inexperienced person might want to create a primary key, add a unique constraint (unique(id)), and then add an index (index(id)) for query use. However, both unique constraints and primary key constraints are used through indexes, so the above statement actually creates three duplicate indexes on the same column. There is usually no reason to do this, except to create different types of indexes on the same column to satisfy different query requirements.

There are some differences between redundant indexes and duplicate indexes. For example, if you create an index (A,B) and then create (A) , it is a redundant index because A is a prefix index of the previous index. The index (A,B) can be used just like A. But if you create an index (B,A) , it is not a redundant index, and neither is index B. Because B is not the leftmost prefix index of the index (A,B) . In addition, other different types of indexes, such as hash and full-text indexes, will not be redundant indexes for B-Tree.

Redundant indexes usually occur when adding new indexes to a table. For example, someone might add a new index (A,B) instead of extending the existing index (A) . Another case is to extend an index to (A,ID) , where ID is the primary key. For InnoDB , the primary key column is already included in the secondary index, so this is also redundant.

In most cases, redundant indexes are not needed, and existing indexes should be expanded instead of creating new ones. But sometimes redundant indexes are needed for performance reasons, because extending an existing index would cause it to become too large, thus affecting the performance of other queries that use the index. For example, if you add a very long varchar column to an integer column index, performance may drop dramatically. Especially when there is an index covering this index, or when this is a MyISAM table and there are many range queries.

The solution to redundant and duplicate indexes is very simple, just delete them. But first thing to do is find such an index. You can find them by writing some complex queries accessing information_schema tables, but there are two simpler ways to locate them using some views in Shlomi Noach 's common_schema (common_schema is a set of common storage and views that can be installed on a server). Another method is to use pt_duplicate-key-checker in Percona Toolkit , which analyzes the table structure to find redundant and duplicate indexes.

9. Unused Indexes

In addition to redundant and duplicate indexes, there may be some indexes that the server never uses. Such an index is completely redundant and it is recommended to delete it directly.

You can use the table_io_waits_summary_by_index_usage table in Performance_schema to find:

SELECT object_schema, object_name, index_name FROM performance_schema.table_io_waits_summary_by_index_usage WHERE index_name IS NOT NULL AND count_star = 0 ORDER BY object_schema, object_name, index_name;

10. Indexes and Locks

Indexes allow queries to lock fewer rows . If your queries never access rows that aren't needed, fewer rows will be locked, which is good for performance in two ways.

First, although InnoDB 's row locks are very efficient and use very little memory, there is still additional overhead when locking rows. Second, locking more rows than necessary will increase lock contention and reduce concurrency.

11. Summary

The above long text is used to explain how to use indexes efficiently and avoid incorrect use. Indexing seems simple, but it is actually very complicated to use. To really use it well, you need constant practice. Practice is the only way to test the truth. This article only positions the overall use of indexes. You only need to read the entire article and combine it with specific examples, or recall the places where it has been used in the past, to have a comprehensive understanding of the whole and understand how the index works. When you use or optimize indexes in the future, you can start from these aspects to further deepen the correct and efficient use of indexes.

In the usual use of indexes, there are the following summaries and suggestions:

  1. Creating an index on a field with high discrimination can effectively use the index. If the discrimination is too low, the index cannot be effectively used and all data pages may need to be scanned. In this case, there is little difference from not using an index.
  2. For joint indexes, pay attention to the leftmost matching principle: the match must be made from left to right. MySQL will keep matching to the right until it encounters a range query (>、<、between、like) and stops matching. For example, if a = 1 and b = 2 and c > 3 and d = 4 are created in the order of (a, b, c, d), d will not be used. If an index is created in the order of (a, b, d, c), all of them can be used. The order of a, b, d can be adjusted arbitrarily.
  3. When querying records, use * less often and try to use index coverage to reduce table return operations and improve efficiency.
  4. Some queries can use joint indexes, which in turn can use index pushdown, and can also reduce table return operations and improve efficiency.
  5. It is forbidden to use functions or operators on index fields, as this will invalidate the index.
  6. Comparing string fields to numbers will invalidate the index.
  7. The fuzzy query '%值%' will invalidate the index and turn it into a full table scan, but '值%' can effectively utilize the index.
  8. Try to use index fields when sorting, which can reduce sorting and improve query efficiency.

The above is the details of how to use indexes efficiently and correctly for MySQL performance optimization. For more information about MySQL indexes, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL performance optimization index pushdown
  • Solutions to Mysql index performance optimization problems
  • Mysql performance optimization case - covering index sharing
  • Mysql performance optimization case study - covering index and SQL_NO_CACHE
  • MySQL performance optimization index optimization
  • The usage strategy and optimization behind MySQL indexes (high-performance index strategy)
  • MySQL uses indexes to optimize performance

<<:  vue-element-admin global loading waiting

>>:  Detailed steps to use Redis in Docker

Recommend

Detailed explanation of how to reduce memory usage in MySql

Preface By default, MySQL will initialize a large...

Perfect solution to Docker Alpine image time zone problem

Recently, when I was using Docker to deploy a Jav...

Example of using rem to replace px in vue project

Table of contents tool Install the plugin Add a ....

Detailed configuration of Nginx supporting both Http and Https

It is almost a standard feature for websites nowa...

SVN installation and basic operation (graphic tutorial)

Table of contents 1. What is SVN 2. Svn server an...

Summary of Vue watch monitoring methods

Table of contents 1. The role of watch in vue is ...

Specific use of Linux dirname command

01. Command Overview dirname - strip non-director...

How to change the default character set of MySQL to utf8 on MAC

1. Check the character set of the default install...

Implementation of React page turner (including front and back ends)

Table of contents front end According to the abov...

Linux View File System Type Example Method

How to check the file system type of a partition ...

Talking about ContentType(s) from image/x-png

This also caused the inability to upload png files...

Docker installation and configuration image acceleration implementation

Table of contents Docker version Install Docker E...

Design: A willful designer

<br />Years of professional art design educa...