Recently, a service has an alarm, which has made me unbearable. The alarm information is as follows:
The general meaning is that there is a database monitoring indicator innodb_row_lock_waits that currently exceeds the threshold of 900 But the embarrassing thing is that every time I went to check the environment after an alarm, the information I got was very limited. There was not enough information in the slow log and error log for analysis. After a while, I began to calm down and analyze the cause of the problem. First of all, the time point of this alarm information seems to be somewhat regular. I compared the alarm time in recent days and found that it is still relatively regular. So what tasks may be triggered at the system level? I looked up and compared the relevant task configurations and found that there is a scheduled task that will be executed once every minute. But here comes the question. If it is executed once every minute, why will there be large differences in processing results at a specific time? Of course, the explanation of this phenomenon is just a beginning. In fact, it is quite easy to prove this point. Today, I adopted a wait-and-see mode. I opened the general log around the time of the alarm. From the log output, the frequency of operations is relatively limited. Soon I got regular alarms, so I started to capture relevant general log records. For example, at 11:18, we can use the following model to get relevant logs. First, get a temporary general log file to capture various DML and execution operations.
Let’s take 11:18 as an example. We can compare the time before and after 1 or 2 minutes. The results are as follows:
It was found that the numbers matched around 1 minute after the alarm was raised. The data volume of this table is more than 2 million, and the table structure is as follows: CREATE TABLE `task_queue` ( `AccID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Auto-increment ID', `TaskStepID` bigint(20) DEFAULT NULL COMMENT 'Task step ID task_step_conf', `QOrder` int(11) DEFAULT NULL COMMENT 'Queue sorting task_step_confi.Step_ID', `QState` tinyint(4) DEFAULT '1' COMMENT 'Queue status 1: Waiting to be executed 2: Executing 3: Successful execution 4: Failed to execute', `QExcCount` int(11) DEFAULT '1' COMMENT 'Number of executions', `CrtTime` datetime DEFAULT NULL COMMENT 'Creation time', `ModTime` datetime DEFAULT NULL COMMENT 'Modification time', PRIMARY KEY (`AccID`), KEY `idx_taskstepid` (`TaskStepID`), KEY `idx_qstate` (`QState`) ) ENGINE=InnoDB AUTO_INCREMENT=3398341 DEFAULT CHARSET=utf8 Based on the analysis and comparison in the logs, we can basically identify the SQL as being in an Update operation. The SQL execution plan is as follows: >>explain update task_queue set QState=1,QExcCount=QExcCount+1,modtime=now() where QState=0 and taskstepid =411\G *************************** 1. row *************************** id: 1 select_type: UPDATE table: task_queue partitions: NULL type: index_merge possible_keys: idx_taskstepid,idx_qstate key: idx_qstate,idx_taskstepid key_len: 2,9 ref: NULL rows: 11 filtered: 100.00 Extra: Using intersect(idx_qstate,idx_taskstepid); Using where; Using temporary In this execution result, key_len is 2,9, which is different from the previous ken_len calculation rule. The Extra column has given a clear hint that this is an intersect process. What’s special is that it is a process based on the secondary index level. There is a related parameter index_merge_intersection at the optimizer level. We know that in MySQL, the primary key is a first-class citizen, and the secondary index will eventually be mapped to the primary key level for processing. The index-level intersect is actually a bit like our left and right hands. The left hand corresponds to some data results mapped to a batch of primary key ids, and the right hand corresponds to some data results mapped to another batch of primary key ids. The primary key id values of the two are intersect-calculated. So in the current scenario, is index-level intersect a good idea? Here I have imagined three comparative scenarios for analysis. First, this is an update statement. In order to ensure the repeatability of subsequent tests, we can convert it into a select statement. select * from task_queue where QState=0 and taskstepid =411; Therefore, our comparative test is based on query statements for comparison and analysis. Scenario 1: The optimizer keeps the default index_merge_intersection enabled and extracts execution details based on the profile >explain select * from task_queue where QState=0 and taskstepid =411\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: task_queue partitions: NULL type: index_merge possible_keys: idx_qstate,idx_taskstepid key: idx_qstate,idx_taskstepid key_len: 2,9 ref: NULL rows: 11 filtered: 100.00 Extra: Using intersect(idx_qstate,idx_taskstepid); Using where 1 row in set, 1 warning (0.00 sec) The profile information is: Scenario 2: The optimizer turns off index_merge_intersection and compares based on profiles >set session optimizer_switch='index_merge_intersection=off'; >explain select * from task_queue where QState=0 and taskstepid =411\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: task_queue partitions: NULL type: ref possible_keys: idx_qstate,idx_taskstepid key: idx_qstate key_len: 2 ref: const rows: 1451 filtered: 0.82 Extra: Using where 1 row in set, 1 warning (0.00 sec) The profile information is: Scenario 3: Reconstruct the index and perform comparative analysis According to the business logic, if a composite index is created, the size of the result set can be greatly reduced, while the idx_qstat index is still retained, so that some businesses can still be used normally. >alter table task_queue drop key idx_taskstepid; >alter table task_queue add key `idx_taskstepid` (`TaskStepID`,QState); explain select * from task_queue where QState=0 and taskstepid =411\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: task_queue partitions: NULL type: ref possible_keys: idx_qstate,idx_taskstepid key: idx_taskstepid key_len: 11 ref: const,const rows: 1 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.00 sec) The profile information is: It can be clearly seen that the "Sending data" part has been reduced by two orders of magnitude through index reconstruction. So the next thing is to conduct further analysis and verification, with reason and evidence, and the waiting process is no longer hesitant. A day has passed and no more alarms have been received, which once again shows that we should not underestimate these alarms at work. Summarize This is the end of this article about MySQL alarm analysis and processing. For more relevant MySQL alarm processing content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: A comprehensive understanding of Vue.js functional components
>>: Solution to Docker's failure to release ports
1. Introduction to Nginx Nginx is a web server th...
Table of contents 1. Rule 1: Object.Method() 1.1 ...
The HTML code for intercepting text beyond multipl...
Introduce two methods to view MySQL user permissi...
This article uses examples to illustrate the pitf...
How to declare a cursor in mysql: 1. Declare vari...
Table of contents Problem Description Cause Analy...
Compared with FTP, SSH-based sftp service has bet...
Flex layout is a commonly used layout method nowa...
structure body, head, html, title text abbr, acro...
1. Download and install Download the community ed...
01. Command Overview dirname - strip non-director...
Table of contents Preface Static scope vs. dynami...
The pagination component is a common component in...
Effect The pictures in the code can be changed by...