A brief discussion on MySQL index optimization analysis

A brief discussion on MySQL index optimization analysis

Why are the SQL queries you write slow? Why do the indexes you create often fail? Through this chapter, you will learn the reasons for MySQL performance degradation, an introduction to indexes, the principles of index creation, the use of the explain command, and the meaning of explain output fields. Help you understand indexes, analyze indexes, and use indexes to write SQL statements with higher performance. What are you waiting for? Roll up your sleeves and get to work!

Case Study

Let's first briefly understand the difference between non-relational databases and relational databases.

MongoDB is a type of NoSQL. The full name of NoSQL is Not only SQL, non-relational database. It is characterized by high performance, strong scalability, and flexible mode, and it performs particularly well in high-concurrency scenarios. But at present it is only a supplement to relational databases, and there is still a certain gap between it and relational databases in terms of data consistency, data security, and query complexity.

MySQL is a relational database with strong query capabilities, high data consistency, high data security, and support for secondary indexes. However, its performance is slightly inferior to MongoDB, especially for data above one million, which can easily lead to slow queries. At this time, you need to analyze the reasons for the slow query. Generally, it is caused by the programmer's poor SQL writing, the lack of key index, or the invalid index.

The company's ERP system database is mainly MongoDB (the NoSQL closest to relational data), followed by Redis, and MySQL only accounts for a small part. Now we are using MySQL again, thanks to Alibaba's Qimen system and Jushita system. Considering that the number of orders is already over one million, performance analysis of MySQL is particularly important.

Let's get started with two simple examples. The function and significance of each parameter will be introduced in detail later.

Note: The SQL needed has been placed on GitHub. If you like it, you can click on the star.

https://github.com/ITDragonBlog/daydayup/tree/master/MySQL/

Scenario 1: Importing orders and avoiding duplicate orders by transaction number

Business logic: When importing orders, in order to avoid duplicate orders, the transaction number is generally used to query the database to determine whether the order already exists.

The most basic SQL statement

mysql> select * from itdragon_order_list where transaction_id = "81X97310V32236260E";
+-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+
| id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date |
+-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+
| 10000 | 81X97310V32236260E | 6.6 | 6.13 | 1 | 10 | ok | ok | auto | 1 | itdragon | 2017-08-18 17:01:49 |
+-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+

mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E";
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+

There is no problem with the query itself, and there is no problem with the offline test environment. However, once the function is launched, the problem of slow query arises. Hundreds or tens of millions of orders, use full table scan? ah? snort!

How do you know that the sql is a full table scan? The explain command can clearly show how MySQL processes SQL statements. The printed contents represent:

  1. id: The query sequence number is 1.
  2. select_type: The query type is a simple query, a simple select statement without union and subquery.
  3. table : table is itdragon_order_list.
  4. partitions : No partitions.
  5. type: connection type, all means full table scan.
  6. possible_keys : possible indexes may be null.
  7. key : The actual index used is null.
  8. key_len : The index length is of course also null.
  9. ref : No column or parameter is used with key.
  10. Extra: where query is used.

Because there are only three records in the database, the rows and filtered information is not very useful. The key point to understand here is that when type is ALL, the performance of full table scan is the worst. Assuming there are millions of data in the database, it will be extremely slow without the help of indexes.

Preliminary optimization: create an index for transaction_id

mysql> create unique index idx_order_transaID on itdragon_order_list (transaction_id);
mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E";
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+

The index created here is a unique index, not a normal index.

The type value printed by the unique index is const. Indicates that it can be found by indexing once. Once the value is found, the scan ends and the query result is returned.

The type value printed by normal index is ref. Indicates a nonunique index scan. If a value is found, continue scanning until the index file is completely scanned. (No code is posted here)
Obviously, the performance of const is much higher than that of ref. And judging by business logic, it is reasonable to create a unique index.

Optimize again: covering index

mysql> explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E";
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+
| 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index |
+----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+

Here, select * from is changed to select transaction_id from, and Extra displays Using index, indicating that the query uses a covering index. This is very good news, indicating that the performance of the SQL statement is very good. If the prompt is Using filesort (using internal sort) and Using temporary (using temporary table), it means that the SQL needs to be optimized immediately.

According to the business logic, the query structure returning transaction_id can meet the business logic requirements.

Scenario 2: Order management page, sorting by order level and order entry time

Business logic: Prioritize orders with high order levels and long entry times.
Since it is sorting, the first thing that comes to mind should be order by, and there is also a scary Using filesort waiting for you.

The most basic SQL statement

mysql> explain select * from itdragon_order_list order by order_level,input_date;
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+

First of all, it is not reasonable to use a full table scan, and using filesort further slows down performance.

MySQL versions prior to 4.1 used a two-way sorting algorithm for file sorting. Since the disk was scanned twice, the I/O took too long. Later it was optimized into a single-path sorting algorithm. Its essence is to trade space for time, but if the amount of data is too large and the buffer space is insufficient, multiple I/Os will occur. The effect is even worse. Instead of asking your operation and maintenance colleagues to modify the MySQL configuration, it is better to build the index yourself.

Preliminary optimization: create a composite index for order_level, input_date

mysql> create index idx_order_levelDate on itdragon_order_list (order_level,input_date);
mysql> explain select * from itdragon_order_list order by order_level,input_date;
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort |
+----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+

After creating a composite index, you may be surprised to find that it is the same as not creating an index? ? ? All are full table scans, and all use file sorting. Is the index invalid? Or did the index creation fail? Let's try to see the following print

mysql> explain select order_level,input_date from itdragon_order_list order by order_level,input_date;
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+
| 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index |
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+

After changing select * from to select order_level,input_date from. The type is upgraded from all to index, indicating a full index scan. Extra also shows that a covering index is used. But that's not right! ! ! ! Although the search is faster, the returned content only contains two fields: order_level and input_date. How can my business colleagues use it? Should we create a composite index for each field?

MySQL is not so stupid. You can use force index to force a specified index. Just modify force index(idx_order_levelDate) in the original SQL statement.

mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date;
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+
| 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL |
+----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+

Optimize again: Do order levels really need to be sorted?

In fact, there is little point in sorting the order levels, and there is little point in adding indexes to the order levels. Because the possible values ​​of order_level are only low, medium, high, and expedited. For such repeated and evenly distributed fields, sorting and indexing are of little use.

Can we fix the value of order_level first and then sort input_date? If the query effect is obvious, you can recommend business colleagues to use this query method.

mysql> explain select * from itdragon_order_list where order_level=3 order by input_date;
+----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition |
+----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+

Compared with the previous SQL, type is upgraded from index to ref (non-unique index scan). The length of the index has changed from 68 to 5, indicating that only one index is used. ref is also a constant. Extra is Using index condition, which means that index scan or full table scan is automatically selected based on the critical value. In general, the performance is much better than the previous sql.

The above two cases are just a quick introduction. We need to remember one thing: optimization is based on business logic. Business logic must never be modified without authorization for the purpose of optimization. Of course it would be best if it could be modified.

Index Introduction

Official definition: Index is a data structure that helps MySQL retrieve data efficiently.

Everyone must be curious about why an index is a data structure and how it improves query speed. Let's take the most commonly used binary tree to analyze how the index works.

Take a look at the following image:

Advantages of creating indexes

1 Improve data retrieval speed and reduce database IO cost: The significance of using indexes is to speed up the search by reducing the number of records that need to be queried in the table.

2 Reduce the cost of data sorting and reduce CPU consumption: The reason why the index is searched quickly is that the data is sorted first. If the field happens to need to be sorted, it really reduces the cost of sorting.

Disadvantages of Creating Indexes

1 Occupies storage space: The index is actually a table that records the primary key and index fields, and is generally stored on the disk in the form of an index file.

2 Reduce the speed of updating the table: When the data in the table changes, the corresponding index also needs to be changed, thereby reducing the update speed. Otherwise, the physical data pointed to by the index may be incorrect, which is also one of the reasons for index failure.

3. It is difficult to create a high-quality index: Creating an index is not a one-day job, nor does it remain unchanged. It is necessary to frequently create the best index based on user behavior and specific business logic.

Index Classification

The index we often refer to is generally the index organized in the BTree (multi-way search tree) structure. There are also aggregate indexes, secondary indexes, composite indexes, prefix indexes, unique indexes, collectively referred to as indexes. Of course, in addition to B+ trees, there are also hash indexes, etc.

  1. Single-value index: An index contains only a single column. A table can have multiple single-column indexes.
  2. Unique index: The value of the index column must be unique, but null values ​​are allowed.
  3. Composite index: An index contains multiple columns. It is recommended to use it in actual development.

In actual development, it is recommended to use composite indexes, and the number of indexes created for a single table should not exceed five.

Basic syntax:

create:

create [unique] index indexName on tableName (columnName...)
alter tableName add [unique] index [indexName] on (columnName...)

delete:

drop index [indexName] on tableName

Check:

show index from tableName

In which cases do you need to create an index?

1 Primary key, unique index
2 Fields that are often used as query conditions need to be indexed
3 Fields that often need to be sorted, grouped, and counted need to be indexed
4. Create indexes for foreign key relationships in queries that are related to other tables

In what situations do not create an index:

1. The table has too few records. No index needs to be created for data below one million.
2. Tables that are frequently added, deleted, and modified do not need to create indexes
3 Fields with repeated and evenly distributed data, such as true and false, do not need to create indexes.
4 Frequently updated fields are not suitable for index creation
5. Fields not used in the where condition do not need to be indexed

Performance Analysis

MySQL’s own bottleneck

MySQL's own performance issues include insufficient disk space, large disk I/O, and low server hardware performance.
1 CPU: CPU saturation usually occurs when data is loaded into memory or read from disk.
2 IO: Disk I/O bottleneck occurs when the loaded data is much larger than the memory capacity
3 Server hardware performance bottlenecks: top, free, iostat and vmstat to view system performance status

explain analyzes SQL statements

Using the explain keyword can simulate the optimizer to execute SQL query statements, so as to understand how MySQL processes SQL statements.

+----+-------------+-------+------------+------+---------------+-----+---------+------+------+------+------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----+---------+------+------+------+------+

id

The sequence number of the select query contains a set of repeatable numbers that indicate the order in which SQL statements are executed in the query. There are generally three situations:
The first type: all ids are the same, and the execution order of sql is from top to bottom;
The second type: all ids are different, and the SQL execution order is based on the priority of the larger id;
The third type: ids are both the same and different. Execute first based on the largest ID, and then execute from top to bottom based on the same ID.

select_type

The type of select query is mainly used to distinguish between ordinary queries, joint queries, and nested complex queries.
simple: a simple select query that does not contain subqueries or unions
primary: If the query contains any complex subqueries, the outermost query is marked as primary
subquery: A subquery is included in the select or where list
Derived: The subqueries contained in the from list are marked as derived. MySQL will recursively execute these subqueries and place the results in a temporary table.
union: If the second select appears after the union, it is marked as union. If the union is included in a subquery in the from clause, the outer select will be marked as derived.
union result: select to get the result from the union table

partitions

The partitions used by the table, if you want to count the amount of company orders for ten years, you can divide the data into ten partitions, one for each year. This can greatly improve query efficiency.

type

This is a very important parameter, the connection type. The common ones are: all, index, range, ref, eq_ref, const, system, null, eight levels.
Performance order from best to worst: system > const > eq_ref > ref > range > index > all
For a Java programmer, if he can ensure that the query reaches at least the range level or preferably the ref level, he is considered an excellent and responsible programmer.
all: (full table scan) Full table scan is undoubtedly the worst. If the data volume is in the millions or tens of millions, a full table scan will be very slow.
index: (full index scan) Full index file scan is much better than all. After all, finding data from the index tree is faster than finding data from the entire table.
range: retrieve only the rows in a given range, using the index to match the rows. The scope is narrowed, and of course it is faster than full table scan and full index file scan. SQL statements generally contain queries such as between, in, >, <, etc.
ref: A non-unique index scan is essentially an index access that returns all rows that match a single value. For example, when querying all colleagues in the company's R&D team, the matching results are multiple but not unique values.
eq_ref: Unique index scan, for each index key, there is a record matching it in the table. For example, if you query the CEO of a company, the matching result may only be one record.
const: indicates that the value can be found by indexing once. const is used to compare primary keys or unique indexes. Because there is only one row of data to match, MySQL can quickly convert the query into a constant if the primary key is placed in the where list.
system: The table has only one record (equal to the system table). This is a special column of the const type. It does not appear normally. Just understand it.

possible_keys

Displays the indexes that may be used by the query statement (one or more or null), which may not be actually used by the query. For reference only.

key

Displays the index actually used by the query statement. If null, it means no index is used.

key_len

Displays the number of bytes used in the index. You can use key_len to calculate the index length used in the query. The shorter the index length, the better without losing accuracy. The value displayed by key_len is the most likely length of the index field, not the actual length used. That is, key_len is calculated based on the table definition, not retrieved from the table.

ref

Shows which column or constant of the index is used to look up the value of the index column.

rows

Based on the table statistics and index selection, it roughly estimates the number of rows that need to be read to find the required records. The larger the value, the worse it is.

extra

Using filesort: Indicates that MySQL will use an external index to sort the data instead of reading it in the order of the index in the table. Sorting operations in MySQL that cannot be performed using indexes are called "file sorts". If this happens, you need to optimize SQL immediately.
Using temporary: A temporary table is used to save intermediate results. MySQL uses a temporary table when sorting query results. Commonly used in sorting order by and grouping query group by. If this happens, you should optimize SQL immediately.
Using index: Indicates that a covering index is used in the corresponding select operation to avoid accessing the data rows of the table. The effect is good! If Using where appears at the same time, it indicates that the index is used to perform the search of the index key value. If Using where is not present, it means that the index is used to read data instead of performing a search.
Covering Index: Also called index covering, the selected data column can be obtained only from the index without reading the data row. MySQL can use the index to return the fields in the select list without having to read the data file again according to the index.
Using index condition: A new feature added after version 5.6. When an index exists, the optimizer will choose whether to use the index or perform a full table traversal based on the ratio of the number of entries that meet the RANGE range to the total number.
Using where: indicates that where filtering is used
Using join buffer: Indicates that the connection cache is used
Impossible where: The value of the where statement is always false, unavailable, and cannot be used to obtain any elements
distinct: Optimizes the distinct operation and stops looking for identical values ​​after finding the first matching tuple.

Filter by

A percentage value, used together with the value of the rows column, can estimate the result set of the previous table in the query execution plan (QEP) to determine the number of iterations of the join operation. Small tables drive large tables, reducing the number of joins.

Through the parameter introduction of explain, we can know:
1. Table read order (id)
2 Data read operation type (type)
3 Which indexes are actually used (key)
4 References between tables (ref)
5 How many rows of each table are queried by the optimizer (rows)

Reasons for performance degradation

From a programmer's perspective
1. The query statement is not well written
2. No index is created, the index is created improperly, or the index is invalid.
3. There are too many joins in the associated query

From the server's perspective
1 Insufficient server disk space
2. Server tuning configuration parameter settings are unreasonable

Summarize

1 An index is a data structure that is sorted and fast to search. Its purpose is to improve the efficiency of the query.
2 After creating the index, querying data becomes faster, but updating data becomes slower.
3 The performance degradation is most likely due to index failure.
4 Principles of index creation: fields that are frequently queried are suitable for index creation, while data that needs to be updated frequently is not suitable for index creation.
5 Frequent updates of index fields or physical deletion of table data can easily cause index failure.
6. Use explain to analyze SQL statements
7 In addition to optimizing SQL statements, you can also optimize the table design. For example, try to make a single table query to reduce the associations between tables. Design filing tables, etc.

This concludes the MySQL index optimization analysis. If you find anything wrong, please point it out. If you think it’s good, you can click to recommend it.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • MySQL uses covering index to avoid table return and optimize query
  • How to optimize MySQL indexes
  • How to view and optimize MySql indexes
  • How to optimize MySQL index function based on Explain keyword
  • How to use indexes to optimize MySQL ORDER BY statements
  • MySQL functional index optimization solution
  • MySQL performance optimization: how to use indexes efficiently and correctly
  • An article to master MySQL index query optimization skills
  • MySQL database optimization: index implementation principle and usage analysis
  • Understanding MySQL - Indexing and Optimization Summary
  • How to design and optimize MySQL indexes

<<:  How to build lnmp environment in docker

>>:  Implementing CommonJS modularity in browsers without compilation/server

Recommend

Detailed explanation of the murder caused by a / slash in Nginx proxy_pass

background An nginx server module needs to proxy ...

MySQL Basic Tutorial: Detailed Explanation of DML Statements

Table of contents DML statements 1. Insert record...

Some front-end basics (html, css) encountered in practice

1. The div css mouse hand shape is cursor:pointer;...

Detailed explanation of the difference between JavaScript onclick and click

Table of contents Why is addEventListener needed?...

Xhtml special characters collection

nbsp &#160; no-break space = non-breaking spa...

The perfect solution for highlighting keywords in HTML

I recently encountered a feature while working on...

How to configure jdk environment under Linux

1. Go to the official website to download the jdk...

MySQL operations: JSON data type operations

In the previous article, we introduced the detail...

How to view mysql binlog (binary log)

For example, when you create a new table or updat...