A detailed analysis and processing of MySQL alarms

A detailed analysis and processing of MySQL alarms

Recently, a service has an alarm, which has made me unbearable. The alarm information is as follows:

Metric:mysql.innodb_row_lock_waits Tags:port=4306,service=xxxx diff(#1): 996>900

The general meaning is that there is a database monitoring indicator innodb_row_lock_waits that currently exceeds the threshold of 900

But the embarrassing thing is that every time I went to check the environment after an alarm, the information I got was very limited. There was not enough information in the slow log and error log for analysis. After a while, I began to calm down and analyze the cause of the problem.

First of all, the time point of this alarm information seems to be somewhat regular. I compared the alarm time in recent days and found that it is still relatively regular. So what tasks may be triggered at the system level? I looked up and compared the relevant task configurations and found that there is a scheduled task that will be executed once every minute. But here comes the question. If it is executed once every minute, why will there be large differences in processing results at a specific time? Of course, the explanation of this phenomenon is just a beginning.

In fact, it is quite easy to prove this point. Today, I adopted a wait-and-see mode. I opened the general log around the time of the alarm. From the log output, the frequency of operations is relatively limited.

Soon I got regular alarms, so I started to capture relevant general log records. For example, at 11:18, we can use the following model to get relevant logs. First, get a temporary general log file to capture various DML and execution operations.

cat general.log|grep -E "insert|delete|update|select|exec" > general_tmp.log

Let’s take 11:18 as an example. We can compare the time before and after 1 or 2 minutes. The results are as follows:

# less general_tmp.log |grep "11:18"|wc -l

400

# less general_tmp.log |grep "11:17"|wc -l

666

# less general_tmp.log |grep "11:16"|wc -l

15

It was found that the numbers matched around 1 minute after the alarm was raised.

The data volume of this table is more than 2 million, and the table structure is as follows:

CREATE TABLE `task_queue` (
 `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID',
 `TaskStepID` bigint(20) DEFAULT NULL COMMENT 'Task step ID task_step_conf',
 `QOrder` int(11) DEFAULT NULL COMMENT 'Queue sorting task_step_confi.Step_ID',
 `QState` tinyint(4) DEFAULT '1' COMMENT 'Queue status 1: Waiting to be executed 2: Executing 3: Successful execution 4: Failed to execute',
 `QExcCount` int(11) DEFAULT '1' COMMENT 'Number of executions',
 `CrtTime` datetime DEFAULT NULL COMMENT 'Creation time',
 `ModTime` datetime DEFAULT NULL COMMENT 'Modification time',
 PRIMARY KEY (`AccID`),
 KEY `idx_taskstepid` (`TaskStepID`),
 KEY `idx_qstate` (`QState`)
) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8

Based on the analysis and comparison in the logs, we can basically identify the SQL as being in an Update operation. The SQL execution plan is as follows:

>>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: UPDATE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_taskstepid,idx_qstate
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary

In this execution result, key_len is 2,9, which is different from the previous ken_len calculation rule. The Extra column has given a clear hint that this is an intersect process. What’s special is that it is a process based on the secondary index level. There is a related parameter index_merge_intersection at the optimizer level.

We know that in MySQL, the primary key is a first-class citizen, and the secondary index will eventually be mapped to the primary key level for processing. The index-level intersect is actually a bit like our left and right hands. The left hand corresponds to some data results mapped to a batch of primary key ids, and the right hand corresponds to some data results mapped to another batch of primary key ids. The primary key id values ​​of the two are intersect-calculated. So in the current scenario, is index-level intersect a good idea?

Here I have imagined three comparative scenarios for analysis. First, this is an update statement. In order to ensure the repeatability of subsequent tests, we can convert it into a select statement.

select * from task_queue where QState=0 and taskstepid =411;

Therefore, our comparative test is based on query statements for comparison and analysis.

Scenario 1: The optimizer keeps the default index_merge_intersection enabled and extracts execution details based on the profile

>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: index_merge
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate,idx_taskstepid
  key_len: 2,9
   ref: NULL
   rows: 11
  filtered: 100.00
  Extra: Using intersect(idx_qstate,idx_taskstepid); Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 2: The optimizer turns off index_merge_intersection and compares based on profiles

>set session optimizer_switch='index_merge_intersection=off';


>explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
   id: 1
 select_type: SIMPLE
  table: task_queue
 partitions: NULL
   type: ref
possible_keys: idx_qstate,idx_taskstepid
   key: idx_qstate
  key_len: 2
   ref: const
   rows: 1451
  filtered: 0.82
  Extra: Using where
1 row in set, 1 warning (0.00 sec)

The profile information is:

Scenario 3: Reconstruct the index and perform comparative analysis

According to the business logic, if a composite index is created, the size of the result set can be greatly reduced, while the idx_qstat index is still retained, so that some businesses can still be used normally.

>alter table task_queue drop key idx_taskstepid;
>alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState);
explain select * from task_queue where QState=0 and taskstepid =411\G
*************************** 1. row ***************************
      id: 1
 select_type: SIMPLE
    table: task_queue
  partitions: NULL
     type: ref
possible_keys: idx_qstate,idx_taskstepid
     key: idx_taskstepid
   key_len: 11
     ref: const,const
     rows: 1
   filtered: 100.00
    Extra: NULL
1 row in set, 1 warning (0.00 sec)

The profile information is:

It can be clearly seen that the "Sending data" part has been reduced by two orders of magnitude through index reconstruction.

So the next thing is to conduct further analysis and verification, with reason and evidence, and the waiting process is no longer hesitant. A day has passed and no more alarms have been received, which once again shows that we should not underestimate these alarms at work.

Summarize

This is the end of this article about MySQL alarm analysis and processing. For more relevant MySQL alarm processing content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to remove the alarm sound when connecting to mysql

<<:  A comprehensive understanding of Vue.js functional components

>>:  Solution to Docker's failure to release ports

Recommend

Nginx Linux installation and deployment detailed tutorial

1. Introduction to Nginx Nginx is a web server th...

What are the rules for context in JavaScript functions?

Table of contents 1. Rule 1: Object.Method() 1.1 ...

HTML exceeds the text line interception implementation principle and code

The HTML code for intercepting text beyond multipl...

Summary of methods for querying MySQL user permissions

Introduce two methods to view MySQL user permissi...

Analysis of pitfalls in rounding operation of ROUND function in MySQL

This article uses examples to illustrate the pitf...

How to declare a cursor in mysql

How to declare a cursor in mysql: 1. Declare vari...

Difference between MySQL update set and and

Table of contents Problem Description Cause Analy...

How to configure ssh/sftp and set permissions under Linux operating system

Compared with FTP, SSH-based sftp service has bet...

Detailed explanation of the solution to image deformation under flex layout

Flex layout is a commonly used layout method nowa...

Common structural tags in XHTML

structure body, head, html, title text abbr, acro...

Detailed installation and configuration of MySql on Mac

1. Download and install Download the community ed...

Specific use of Linux dirname command

01. Command Overview dirname - strip non-director...

JavaScript static scope and dynamic scope explained with examples

Table of contents Preface Static scope vs. dynami...

Detailed explanation of how to use JavaScript paging component

The pagination component is a common component in...

HTML+CSS to implement the sample code of the navigation bar drop-down menu

Effect The pictures in the code can be changed by...