Why are the SQL queries you write slow? Why do the indexes you create often fail? Through this chapter, you will learn the reasons for MySQL performance degradation, an introduction to indexes, the principles of index creation, the use of the explain command, and the meaning of explain output fields. Help you understand indexes, analyze indexes, and use indexes to write SQL statements with higher performance. What are you waiting for? Roll up your sleeves and get to work! Case Study Let's first briefly understand the difference between non-relational databases and relational databases. MongoDB is a type of NoSQL. The full name of NoSQL is Not only SQL, non-relational database. It is characterized by high performance, strong scalability, and flexible mode, and it performs particularly well in high-concurrency scenarios. But at present it is only a supplement to relational databases, and there is still a certain gap between it and relational databases in terms of data consistency, data security, and query complexity. MySQL is a relational database with strong query capabilities, high data consistency, high data security, and support for secondary indexes. However, its performance is slightly inferior to MongoDB, especially for data above one million, which can easily lead to slow queries. At this time, you need to analyze the reasons for the slow query. Generally, it is caused by the programmer's poor SQL writing, the lack of key index, or the invalid index. The company's ERP system database is mainly MongoDB (the NoSQL closest to relational data), followed by Redis, and MySQL only accounts for a small part. Now we are using MySQL again, thanks to Alibaba's Qimen system and Jushita system. Considering that the number of orders is already over one million, performance analysis of MySQL is particularly important. Let's get started with two simple examples. The function and significance of each parameter will be introduced in detail later. Note: The SQL needed has been placed on GitHub. If you like it, you can click on the star. https://github.com/ITDragonBlog/daydayup/tree/master/MySQL/ Scenario 1: Importing orders and avoiding duplicate orders by transaction number Business logic: When importing orders, in order to avoid duplicate orders, the transaction number is generally used to query the database to determine whether the order already exists. The most basic SQL statement mysql> select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+ | id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date | +-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+ | 10000 | 81X97310V32236260E | 6.6 | 6.13 | 1 | 10 | ok | ok | auto | 1 | itdragon | 2017-08-18 17:01:49 | +-------+--------------------+-------+------+----------+--------------+----------+------------------+------------+-------------+------------+-------------+ mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+----------+-------------+ There is no problem with the query itself, and there is no problem with the offline test environment. However, once the function is launched, the problem of slow query arises. Hundreds or tens of millions of orders, use full table scan? ah? snort! How do you know that the sql is a full table scan? The explain command can clearly show how MySQL processes SQL statements. The printed contents represent:
Because there are only three records in the database, the rows and filtered information is not very useful. The key point to understand here is that when type is ALL, the performance of full table scan is the worst. Assuming there are millions of data in the database, it will be extremely slow without the help of indexes. Preliminary optimization: create an index for transaction_id mysql> create unique index idx_order_transaID on itdragon_order_list (transaction_id); mysql> explain select * from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+----------+-------+ The index created here is a unique index, not a normal index. The type value printed by the unique index is const. Indicates that it can be found by indexing once. Once the value is found, the scan ends and the query result is returned. The type value printed by normal index is ref. Indicates a nonunique index scan. If a value is found, continue scanning until the index file is completely scanned. (No code is posted here) Optimize again: covering index mysql> explain select transaction_id from itdragon_order_list where transaction_id = "81X97310V32236260E"; +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+ | 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index | +----+-------------+---------------------+------------+-------+--------------------+--------------------+---------+-------+------+---------+ Here, select * from is changed to select transaction_id from, and Extra displays Using index, indicating that the query uses a covering index. This is very good news, indicating that the performance of the SQL statement is very good. If the prompt is Using filesort (using internal sort) and Using temporary (using temporary table), it means that the SQL needs to be optimized immediately. According to the business logic, the query structure returning transaction_id can meet the business logic requirements. Scenario 2: Order management page, sorting by order level and order entry time Business logic: Prioritize orders with high order levels and long entry times. The most basic SQL statement mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ First of all, it is not reasonable to use a full table scan, and using filesort further slows down performance. MySQL versions prior to 4.1 used a two-way sorting algorithm for file sorting. Since the disk was scanned twice, the I/O took too long. Later it was optimized into a single-path sorting algorithm. Its essence is to trade space for time, but if the amount of data is too large and the buffer space is insufficient, multiple I/Os will occur. The effect is even worse. Instead of asking your operation and maintenance colleagues to modify the MySQL configuration, it is better to build the index yourself. Preliminary optimization: create a composite index for order_level, input_date mysql> create index idx_order_levelDate on itdragon_order_list (order_level,input_date); mysql> explain select * from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort | +----+-------------+---------------------+------------+------+---------------+------+---------+------+------+------+----------------+ After creating a composite index, you may be surprised to find that it is the same as not creating an index? ? ? All are full table scans, and all use file sorting. Is the index invalid? Or did the index creation fail? Let's try to see the following print mysql> explain select order_level,input_date from itdragon_order_list order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+---------+-------------+ After changing select * from to select order_level,input_date from. The type is upgraded from all to index, indicating a full index scan. Extra also shows that a covering index is used. But that's not right! ! ! ! Although the search is faster, the returned content only contains two fields: order_level and input_date. How can my business colleagues use it? Should we create a composite index for each field? MySQL is not so stupid. You can use force index to force a specified index. Just modify force index(idx_order_levelDate) in the original SQL statement. mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date; +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+ | 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL | +----+-------------+---------------------+------------+-------+---------------+---------------------+---------+------+------+------+------+ Optimize again: Do order levels really need to be sorted? In fact, there is little point in sorting the order levels, and there is little point in adding indexes to the order levels. Because the possible values of order_level are only low, medium, high, and expedited. For such repeated and evenly distributed fields, sorting and indexing are of little use. Can we fix the value of order_level first and then sort input_date? If the query effect is obvious, you can recommend business colleagues to use this query method. mysql> explain select * from itdragon_order_list where order_level=3 order by input_date; +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+ | 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition | +----+-------------+---------------------+------------+------+---------------------+---------------------+---------+---+------+------+----------------------+ Compared with the previous SQL, type is upgraded from index to ref (non-unique index scan). The length of the index has changed from 68 to 5, indicating that only one index is used. ref is also a constant. Extra is Using index condition, which means that index scan or full table scan is automatically selected based on the critical value. In general, the performance is much better than the previous sql. The above two cases are just a quick introduction. We need to remember one thing: optimization is based on business logic. Business logic must never be modified without authorization for the purpose of optimization. Of course it would be best if it could be modified. Index Introduction Official definition: Index is a data structure that helps MySQL retrieve data efficiently. Everyone must be curious about why an index is a data structure and how it improves query speed. Let's take the most commonly used binary tree to analyze how the index works. Take a look at the following image: Advantages of creating indexes 1 Improve data retrieval speed and reduce database IO cost: The significance of using indexes is to speed up the search by reducing the number of records that need to be queried in the table. 2 Reduce the cost of data sorting and reduce CPU consumption: The reason why the index is searched quickly is that the data is sorted first. If the field happens to need to be sorted, it really reduces the cost of sorting. Disadvantages of Creating Indexes 1 Occupies storage space: The index is actually a table that records the primary key and index fields, and is generally stored on the disk in the form of an index file. 2 Reduce the speed of updating the table: When the data in the table changes, the corresponding index also needs to be changed, thereby reducing the update speed. Otherwise, the physical data pointed to by the index may be incorrect, which is also one of the reasons for index failure. 3. It is difficult to create a high-quality index: Creating an index is not a one-day job, nor does it remain unchanged. It is necessary to frequently create the best index based on user behavior and specific business logic. Index Classification The index we often refer to is generally the index organized in the BTree (multi-way search tree) structure. There are also aggregate indexes, secondary indexes, composite indexes, prefix indexes, unique indexes, collectively referred to as indexes. Of course, in addition to B+ trees, there are also hash indexes, etc.
In actual development, it is recommended to use composite indexes, and the number of indexes created for a single table should not exceed five. Basic syntax: create: create [unique] index indexName on tableName (columnName...) alter tableName add [unique] index [indexName] on (columnName...) delete: drop index [indexName] on tableName Check: show index from tableName In which cases do you need to create an index? 1 Primary key, unique index In what situations do not create an index: 1. The table has too few records. No index needs to be created for data below one million. Performance Analysis MySQL’s own bottleneck MySQL's own performance issues include insufficient disk space, large disk I/O, and low server hardware performance. explain analyzes SQL statements Using the explain keyword can simulate the optimizer to execute SQL query statements, so as to understand how MySQL processes SQL statements. +----+-------------+-------+------------+------+---------------+-----+---------+------+------+------+------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-----+---------+------+------+------+------+ id The sequence number of the select query contains a set of repeatable numbers that indicate the order in which SQL statements are executed in the query. There are generally three situations: select_type The type of select query is mainly used to distinguish between ordinary queries, joint queries, and nested complex queries. partitions The partitions used by the table, if you want to count the amount of company orders for ten years, you can divide the data into ten partitions, one for each year. This can greatly improve query efficiency. type This is a very important parameter, the connection type. The common ones are: all, index, range, ref, eq_ref, const, system, null, eight levels. possible_keys Displays the indexes that may be used by the query statement (one or more or null), which may not be actually used by the query. For reference only. key Displays the index actually used by the query statement. If null, it means no index is used. key_len Displays the number of bytes used in the index. You can use key_len to calculate the index length used in the query. The shorter the index length, the better without losing accuracy. The value displayed by key_len is the most likely length of the index field, not the actual length used. That is, key_len is calculated based on the table definition, not retrieved from the table. ref Shows which column or constant of the index is used to look up the value of the index column. rows Based on the table statistics and index selection, it roughly estimates the number of rows that need to be read to find the required records. The larger the value, the worse it is. extra Using filesort: Indicates that MySQL will use an external index to sort the data instead of reading it in the order of the index in the table. Sorting operations in MySQL that cannot be performed using indexes are called "file sorts". If this happens, you need to optimize SQL immediately. Filter by A percentage value, used together with the value of the rows column, can estimate the result set of the previous table in the query execution plan (QEP) to determine the number of iterations of the join operation. Small tables drive large tables, reducing the number of joins. Through the parameter introduction of explain, we can know: Reasons for performance degradation From a programmer's perspective From the server's perspective Summarize 1 An index is a data structure that is sorted and fast to search. Its purpose is to improve the efficiency of the query. This concludes the MySQL index optimization analysis. If you find anything wrong, please point it out. If you think it’s good, you can click to recommend it. The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: How to build lnmp environment in docker
>>: Implementing CommonJS modularity in browsers without compilation/server
background An nginx server module needs to proxy ...
Table of contents DML statements 1. Insert record...
1. The div css mouse hand shape is cursor:pointer;...
Web page encoding is translated into English as we...
Table of contents Why is addEventListener needed?...
What is a table? It is composed of cell cells. In...
This article uses an example to describe how MySQ...
nbsp   no-break space = non-breaking spa...
Deploy the MySQL environment locally (192.168.1.1...
I recently encountered a feature while working on...
On the mobile side, flex layout is very useful. I...
1. Go to the official website to download the jdk...
In the previous article, we introduced the detail...
A sophomore asked me how to install and configure...
For example, when you create a new table or updat...