A detailed analysis and processing of MySQL alarms

Recently, a service has an alarm, which has made me unbearable. The alarm information is as follows:

Metric:mysql.innodb_row_lock_waits Tags:port=4306,service=xxxx diff(#1): 996>900

The general meaning is that there is a database monitoring indicator innodb_row_lock_waits that currently exceeds the threshold of 900

But the embarrassing thing is that every time I went to check the environment after an alarm, the information I got was very limited. There was not enough information in the slow log and error log for analysis. After a while, I began to calm down and analyze the cause of the problem.

First of all, the time point of this alarm information seems to be somewhat regular. I compared the alarm time in recent days and found that it is still relatively regular. So what tasks may be triggered at the system level? I looked up and compared the relevant task configurations and found that there is a scheduled task that will be executed once every minute. But here comes the question. If it is executed once every minute, why will there be large differences in processing results at a specific time? Of course, the explanation of this phenomenon is just a beginning.

In fact, it is quite easy to prove this point. Today, I adopted a wait-and-see mode. I opened the general log around the time of the alarm. From the log output, the frequency of operations is relatively limited.

Soon I got regular alarms, so I started to capture relevant general log records. For example, at 11:18, we can use the following model to get relevant logs. First, get a temporary general log file to capture various DML and execution operations.

cat general.log|grep -E "insert|delete|update|select|exec" > general_tmp.log

Let’s take 11:18 as an example. We can compare the time before and after 1 or 2 minutes. The results are as follows:

# less general_tmp.log |grep "11:18"|wc -l
400
# less general_tmp.log |grep "11:17"|wc -l
666
# less general_tmp.log |grep "11:16"|wc -l
15

It was found that the numbers matched around 1 minute after the alarm was raised.

The data volume of this table is more than 2 million, and the table structure is as follows:

CREATE TABLE `task_queue` (
 `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
 `TaskStepID` bigint(20) DEFAULT NULL COMMENT 'Task step ID task_step_conf',
 `QOrder` int(11) DEFAULT NULL COMMENT 'Queue sorting task_step_confi.Step_ID',
 `QState` tinyint(4) DEFAULT '1' COMMENT 'Queue status 1: Waiting to be executed 2: Executing 3: Successful execution 4: Failed to execute',
 `QExcCount` int(11) DEFAULT '1' COMMENT 'Number of executions',
 `CrtTime` datetime DEFAULT NULL COMMENT 'Creation time',
 `ModTime` datetime DEFAULT NULL COMMENT 'Modification time',
 PRIMARY KEY (`AccID`),
 KEY `idx_taskstepid` (`TaskStepID`),
 KEY `idx_qstate` (`QState`)
) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8

Based on the analysis and comparison in the logs, we can basically identify the SQL as being in an Update operation. The SQL execution plan is as follows:

>>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: UPDATE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_taskstepid,idx_qstate
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary

In this execution result, key_len is 2,9, which is different from the previous ken_len calculation rule. The Extra column has given a clear hint that this is an intersect process. What’s special is that it is a process based on the secondary index level. There is a related parameter index_merge_intersection at the optimizer level.

We know that in MySQL, the primary key is a first-class citizen, and the secondary index will eventually be mapped to the primary key level for processing. The index-level intersect is actually a bit like our left and right hands. The left hand corresponds to some data results mapped to a batch of primary key ids, and the right hand corresponds to some data results mapped to another batch of primary key ids. The primary key id values of the two are intersect-calculated. So in the current scenario, is index-level intersect a good idea?

Here I have imagined three comparative scenarios for analysis. First, this is an update statement. In order to ensure the repeatability of subsequent tests, we can convert it into a select statement.

select * from task_queue where QState=0 and taskstepid =411;

Therefore, our comparative test is based on query statements for comparison and analysis.

Scenario 1: The optimizer keeps the default index_merge_intersection enabled and extracts execution details based on the profile

>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 2: The optimizer turns off index_merge_intersection and compares based on profiles

>set session optimizer_switch='index_merge_intersection=off';


>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: ref
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate
  key_len: 2
   ref: const
   rows: 1451
  filtered: 0.82
  Extra: Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 3: Reconstruct the index and perform comparative analysis

According to the business logic, if a composite index is created, the size of the result set can be greatly reduced, while the idx_qstat index is still retained, so that some businesses can still be used normally.

>alter table task_queue drop key idx_taskstepid;
>alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState);
explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: task_queue
  partitions: NULL
     type: ref
possible_keys: idx_qstate,idx_taskstepid
     key: idx_taskstepid
   key_len: 11
     ref: const,const
     rows: 1
   filtered: 100.00
    Extra: NULL
1 row in set, 1 warning (0.00 sec)

The profile information is:

It can be clearly seen that the "Sending data" part has been reduced by two orders of magnitude through index reconstruction.

So the next thing is to conduct further analysis and verification, with reason and evidence, and the waiting process is no longer hesitant. A day has passed and no more alarms have been received, which once again shows that we should not underestimate these alarms at work.

Summarize

This is the end of this article about MySQL alarm analysis and processing. For more relevant MySQL alarm processing content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

How to remove the alarm sound when connecting to mysql

<<: A comprehensive understanding of Vue.js functional components

>>: Solution to Docker's failure to release ports

Solution to the error in compiling LVGL emulator on Linux

A detailed analysis and processing of MySQL alarms

Solution to the error in compiling LVGL emulator on Linux

js to realize automatic lock screen function

How to build a redis cluster using docker

About VUE's compilation scope and slot scope slot issues

Layui table selects the radio button of the specified row and scrolls to the implementation code of the row

Solution - BASH: /HOME/JAVA/JDK1.8.0_221/BIN/JAVA: Insufficient permissions

Implementation of MySQL index-based stress testing

How to use ssh tunnel to connect to mysql server

Analysis of common basic operations of MySQL database [create, view, modify and delete database]

Detailed tutorial on installing Mysql5.7.19 on Centos7 under Linux

Recommend

Detailed explanation of how to install centos7 using win10's built-in virtual machine hyper-v

Vue implements card flip carousel display

Docker Data Storage Volumes Detailed Explanation

Summary of 3 ways to lazy load vue-router

MySQL Interview Questions: How to Set Up Hash Indexes

Detailed explanation of replication configuration example between mysql containers

MySQL implements a solution similar to Oracle sequence

Some suggestions for Linux system optimization (kernel optimization)

Jenkins builds Docker images and pushes them to Harbor warehouse

HTML basic structure_Powernode Java Academy

Basic operations of mysql learning notes table

Linux bash: ./xxx: Unable to execute binary file error

Top 10 Js Image Processing Libraries

How to upload the jar package to nexus via the web page

MYSQL METADATA LOCK (MDL LOCK) MDL lock problem analysis