A case study to thoroughly understand how to correctly use MySQL inndb joint index

A case study to thoroughly understand how to correctly use MySQL inndb joint index

There is a business that queries the 5 most recently reviewed data.

SELECT `id`, `title`
FROM `th_content`
WHERE `audit_time` < 1541984478
 AND `status` = 'ONLINE'
ORDER BY `audit_time` DESC, `id` DESC
LIMIT 5;

Checking the monitoring status at that time, the CPU usage was over 100%. show processlist showed that many similar queries were in create sort index state.

View the structure of the table

CREATE TABLE `th_content` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT 'Content title',
 `content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT 'Main content',
 `audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT 'Audit time',
 `last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'Last edit time',
 `status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT 'Information status',
 PRIMARY KEY (`id`),
 KEY `idx_at_let` (`audit_time`,`last_edit_time`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

The index has a joint index with audit_time on the left, and no index on status .

Analyze the logic of the above sql execution:

  • Find all primary key IDs that are less than the review time from the joint index (if 1 million data have been reviewed before this timestamp, the primary key IDs of the corresponding 1 million data will be retrieved from the joint index)
  • It would be nice if there is an optimization in the future. Currently, there is: sort the 100 primary key IDs, and then in the next step of the table return operation, the primary keys that are close together may be retrieved in one disk I/O.
  • Go back to the table one by one, find out 1 million rows, and filter out the rows with status='ONLINE'
  • Finally, sort the query results (if 500,000 rows are all ONLINE, continue to sort these 500,000 rows)

Finally, because the amount of data is large, although only 5 rows are taken, according to the extreme example we just gave, 1 million rows of data are actually queried, and finally a memory sort of 500,000 rows of the database is performed in memory.

So it is very inefficient.

A schematic diagram is drawn to illustrate the query process in the first step. The pink part represents the data rows that need to be queried in the table in the end.

In the picture, I forged and filled in some data according to the index storage rules. If there is anything wrong, please leave a message to point it out. I hope that through this picture, you can see the way of joint index storage and index query.

Improvement idea 1

Range searches are not easy to use with good indexes. What improvements will there be if we add a joint index of audit_time and status ?

ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);
mysql> explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5;
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+
| 1 | SIMPLE | th_content | range | idx_at_ft_pt_let,idx_audit_status | idx_audit_status | 4 | NULL | 209754 | Using where |
+----+-------------+------------+-------+------------------------------------------+------------------+--------+------+--------+-------------+

Details: Because audit_time is a range search, the index of the second column is not used. Only audit_time can be used, so key_len is 4. In the following idea 2, key_len these two fields is still 5.

Let's analyze the execution process after adding the index:

  • Find the joint index of the row with the largest audit_time that is less than the audit time from the joint index
  • Then search downwards one by one, because < audit_time is a range search and the values ​​of the second column index are scattered. Therefore, you need to search forward one by one to match the index rows that meet the condition ( status = 'ONLINE') until you get the 5th row.
  • Return to the table to query the specific data required

In the diagram above, the pink color indicates the rows that meet the index requirements of the first column. If we query forward one by one, we will find three records on this leaf node. Then we need to continue to query to the left at the previous leaf node. Until 5 rows that meet the records are found, and finally return to the table.

Improvements

Because status value is in the index, there is no need to return to the table for query when filtering rows that satisfy status = 'ONLINE'. When returning to the table, only 5 rows of data are queried, which greatly reduces iops .

Disadvantages of this index

If status all five rows scanned in idx_audit_status is ONLINE , then only five rows need to be scanned;

If only 4 rows in the first 1 million rows scanned in idx_audit_status have status of ONLINE , you need to scan 1 million and 1 row to get the required 5 rows. The index needs to scan an undetermined number of rows.

Improvement idea 2

ALTER TABLE `th_content` DROP INDEX `idx_audit_status`;
ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`); 

This way, there is no pressure whether it is sorting or returning to the table.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM. If you want to learn more about this, please check out the following links

You may also be interested in:
  • Implementation of MySQL joint index (composite index)
  • Which is faster among MySQL full-text index, joint index, like query, and json query?
  • In-depth analysis of the leftmost matching principle of MySQL joint index
  • Analysis of MySQL joint index function and usage examples
  • MySQL joint index usage examples
  • Joint Index Learning Tutorial in MySQL
  • Optimizing MySQL joint index and Where clause to improve database operation efficiency
  • MySQL independent index and joint index selection

<<:  Graphic tutorial on installing CentOS7 on VMware 15.5

>>:  Detailed explanation of JavaScript stack and copy

Recommend

js canvas implements verification code and obtains verification code function

This article example shares the specific code of ...

select the best presets to create full compatibility with all browsersselect

We know that the properties of the select tag in e...

Basic ideas and codes for implementing video players in browsers

Table of contents Preface Summary of audio and vi...

Use shell script to install python3.8 environment in CentOS7 (recommended)

One-click execution To install Python 3.8 in a vi...

Solution to the problem of session failure caused by nginx reverse proxy

A colleague asked for help: the login to the back...

Detailed explanation of the minimum width value of inline-block in CSS

Preface Recently, I have been taking some time in...

Nginx uses the Gzip algorithm to compress messages

What is HTTP Compression Sometimes, relatively la...

MySQL 5.7.31 64-bit free installation version tutorial diagram

1. Download Download address: https://dev.mysql.c...

Summary of new usage of vi (vim) under Linux

I have used the vi editor for several years, but ...

Detailed steps to change the default password when installing MySQL in Ubuntu

Step 1: Enter the directory: cd /etc/mysql, view ...

Detailed explanation of custom instructions for Vue.js source code analysis

Preface In addition to the default built-in direc...

Docker implements container port binding local port

Today, I encountered a small problem that after s...