A detailed analysis and processing of MySQL alarms

A detailed analysis and processing of MySQL alarms

Recently, a service has an alarm, which has made me unbearable. The alarm information is as follows:

Metric:mysql.innodb_row_lock_waits Tags:port=4306,service=xxxx diff(#1): 996>900

The general meaning is that there is a database monitoring indicator innodb_row_lock_waits that currently exceeds the threshold of 900

But the embarrassing thing is that every time I went to check the environment after an alarm, the information I got was very limited. There was not enough information in the slow log and error log for analysis. After a while, I began to calm down and analyze the cause of the problem.

First of all, the time point of this alarm information seems to be somewhat regular. I compared the alarm time in recent days and found that it is still relatively regular. So what tasks may be triggered at the system level? I looked up and compared the relevant task configurations and found that there is a scheduled task that will be executed once every minute. But here comes the question. If it is executed once every minute, why will there be large differences in processing results at a specific time? Of course, the explanation of this phenomenon is just a beginning.

In fact, it is quite easy to prove this point. Today, I adopted a wait-and-see mode. I opened the general log around the time of the alarm. From the log output, the frequency of operations is relatively limited.

Soon I got regular alarms, so I started to capture relevant general log records. For example, at 11:18, we can use the following model to get relevant logs. First, get a temporary general log file to capture various DML and execution operations.

cat general.log|grep -E "insert|delete|update|select|exec" > general_tmp.log

Let’s take 11:18 as an example. We can compare the time before and after 1 or 2 minutes. The results are as follows:

# less general_tmp.log |grep "11:18"|wc -l

400

# less general_tmp.log |grep "11:17"|wc -l

666

# less general_tmp.log |grep "11:16"|wc -l

15

It was found that the numbers matched around 1 minute after the alarm was raised.

The data volume of this table is more than 2 million, and the table structure is as follows:

CREATE TABLE `task_queue` (
 `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
 `TaskStepID` bigint(20) DEFAULT NULL COMMENT 'Task step ID task_step_conf',
 `QOrder` int(11) DEFAULT NULL COMMENT 'Queue sorting task_step_confi.Step_ID',
 `QState` tinyint(4) DEFAULT '1' COMMENT 'Queue status 1: Waiting to be executed 2: Executing 3: Successful execution 4: Failed to execute',
 `QExcCount` int(11) DEFAULT '1' COMMENT 'Number of executions',
 `CrtTime` datetime DEFAULT NULL COMMENT 'Creation time',
 `ModTime` datetime DEFAULT NULL COMMENT 'Modification time',
 PRIMARY KEY (`AccID`),
 KEY `idx_taskstepid` (`TaskStepID`),
 KEY `idx_qstate` (`QState`)
) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8

Based on the analysis and comparison in the logs, we can basically identify the SQL as being in an Update operation. The SQL execution plan is as follows:

>>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: UPDATE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_taskstepid,idx_qstate
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary

In this execution result, key_len is 2,9, which is different from the previous ken_len calculation rule. The Extra column has given a clear hint that this is an intersect process. What’s special is that it is a process based on the secondary index level. There is a related parameter index_merge_intersection at the optimizer level.

We know that in MySQL, the primary key is a first-class citizen, and the secondary index will eventually be mapped to the primary key level for processing. The index-level intersect is actually a bit like our left and right hands. The left hand corresponds to some data results mapped to a batch of primary key ids, and the right hand corresponds to some data results mapped to another batch of primary key ids. The primary key id values ​​of the two are intersect-calculated. So in the current scenario, is index-level intersect a good idea?

Here I have imagined three comparative scenarios for analysis. First, this is an update statement. In order to ensure the repeatability of subsequent tests, we can convert it into a select statement.

select * from task_queue where QState=0 and taskstepid =411;

Therefore, our comparative test is based on query statements for comparison and analysis.

Scenario 1: The optimizer keeps the default index_merge_intersection enabled and extracts execution details based on the profile

>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 2: The optimizer turns off index_merge_intersection and compares based on profiles

>set session optimizer_switch='index_merge_intersection=off';


>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: ref
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate
  key_len: 2
   ref: const
   rows: 1451
  filtered: 0.82
  Extra: Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 3: Reconstruct the index and perform comparative analysis

According to the business logic, if a composite index is created, the size of the result set can be greatly reduced, while the idx_qstat index is still retained, so that some businesses can still be used normally.

>alter table task_queue drop key idx_taskstepid;
>alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState);
explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: task_queue
  partitions: NULL
     type: ref
possible_keys: idx_qstate,idx_taskstepid
     key: idx_taskstepid
   key_len: 11
     ref: const,const
     rows: 1
   filtered: 100.00
    Extra: NULL
1 row in set, 1 warning (0.00 sec)

The profile information is:

It can be clearly seen that the "Sending data" part has been reduced by two orders of magnitude through index reconstruction.

So the next thing is to conduct further analysis and verification, with reason and evidence, and the waiting process is no longer hesitant. A day has passed and no more alarms have been received, which once again shows that we should not underestimate these alarms at work.

Summarize

This is the end of this article about MySQL alarm analysis and processing. For more relevant MySQL alarm processing content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to remove the alarm sound when connecting to mysql

<<:  A comprehensive understanding of Vue.js functional components

>>:  Solution to Docker's failure to release ports

Recommend

Vue implements card flip carousel display

Vue card flip carousel display, while switching d...

Docker Data Storage Volumes Detailed Explanation

By default, the reading and writing of container ...

Summary of 3 ways to lazy load vue-router

Not using lazy loading import Vue from 'vue&#...

MySQL Interview Questions: How to Set Up Hash Indexes

In addition to B-Tree indexes, MySQL also provide...

MySQL implements a solution similar to Oracle sequence

MySQL implements Oracle-like sequences Oracle gen...

Some suggestions for Linux system optimization (kernel optimization)

Disable swap If the server is running a database ...

Jenkins builds Docker images and pushes them to Harbor warehouse

Table of contents Dockerfile pom.xml Jenkins Conf...

HTML basic structure_Powernode Java Academy

Many times when learning web page development, th...

Basic operations of mysql learning notes table

Create Table create table table name create table...

Linux bash: ./xxx: Unable to execute binary file error

Today I sent a small tool for Ubuntu to a custome...

Top 10 Js Image Processing Libraries

Table of contents introduce 1. Pica 2. Lena.js 3....

How to upload the jar package to nexus via the web page

When using Maven to manage projects, how to uploa...

MYSQL METADATA LOCK (MDL LOCK) MDL lock problem analysis

1. Introduction MDL lock in MYSQL has always been...