Recently, several data anomalies have occurred in MySQL online, all of which occurred in the early morning. Since the business scenario is a typical data warehouse application, the pressure is less during the day and cannot be reproduced. Some of the anomalies were even quite strange, and it took a lot of effort to analyze the root cause. So how can we quickly locate online MySQL problems and fix exceptions in actual business? Below I will share relevant experiences and methods based on two actual cases. Case 1: Partial data update failed One day, a channel colleague reported that the data of a few channels in a certain report was 0, but the data of most channels was normal. This data is routinely updated by a statistical program every morning. Logically, either everything is normal or everything fails. So what is the reason for the abnormality of a few individual data points? The first thing we can think of is to check the statistics task log. However, after reading the log printed by the statistics program, we did not find any abnormal descriptions such as SQL update failure. So what happened to the database at that time? Before viewing the MySQL-server log, I habitually checked the database status: I just saw this update deadlock in the early morning: Due to space limitations, I have omitted a lot of context here. From this log, you can see that TRANSACTION 1 and TRANSACTION 2 each hold a certain number of row locks, and then wait for each other's locks. Finally, MySQL detects the deadlock and chooses to roll back TRANSACTION 1: Innodb's current method of handling deadlocks is to roll back the transaction that holds the least row-level exclusive locks. There are three questions here: 1. Doesn’t InnoDB row lock only lock one row? Because this table is innodb engine, InnoDB supports row locks and table locks. InnoDB row locks are implemented by locking the index items on the index. This is different from MySQL and Oracle, which implements it by locking the corresponding data rows in the data block. InnoDB's row lock implementation feature means that InnoDB uses row-level locks only when data is retrieved through index conditions. Otherwise, InnoDB will use table locks and lock all scanned rows! In actual applications, special attention should be paid to this feature of InnoDB row locks, otherwise, a large number of lock conflicts may occur, thus affecting concurrency performance. Since MySQL row locks are locked for indexes rather than records, lock conflicts will occur when accessing records in different rows but using the same index key. When we retrieve data using range conditions instead of equality conditions and request shared or exclusive locks, InnoDB will lock the index items of existing data records that meet the conditions; in addition, gap locks will also lock multiple rows. In addition to using gap locks when locking through range conditions, InnoDB will also use gap locks if an equality condition is used to request a lock on a non-existent record! Having said that, let's take a look at the index of our business table: It can be seen that the index of this table is extremely unreasonable: there are 3 indexes, but the update does not fully use the indexes, resulting in the update not accurately using the indexes and needing to lock multiple rows of data, thus causing a deadlock. After knowing the principle, we can carefully construct a four-field combined index to allow the update to accurately go through the InnoDB index. In fact, after we update the index, the deadlock problem is solved. Note: Innodb will not only print out the transactions and the locks held and waited for by the transactions, but also the records themselves. Unfortunately, it may exceed the length reserved by Innodb for the output results (only 1M of content can be printed and only the most recent deadlock information can be retained). If you cannot see the complete output, you can create the innodb_monitor or innodb_lock_monitor table under any library, so that the innodb status information will be complete and recorded in the error log every 15 seconds. For example: create table innodb_monitor(a int)engine=innodb;. If you do not need to record the error log, just delete the table. 2. Why do only some update statements fail during rollback? If rolled back, why do only some update statements fail, rather than all updates in the entire transaction failing? This is because our innodb is automatically submitted by default: In the case of multiple update or insert statements, after each SQL statement is executed, InnoDB immediately commits once to persist the changes and releases the locks. This is why only a few statements fail after the deadlock rolls back the transaction in this example. It should be noted that there is usually another situation that may also cause some statements to be rolled back, so you need to pay special attention to it. There is a parameter in innodb called: innodb_rollback_on_timeout The official manual describes it this way: In MySQL 5.1, InnoDB rolls back only the last statement on a transaction timeout by default. If –innodb_rollback_on_timeout is specified, a transaction timeout causes InnoDB to abort and roll back the entire transaction (the same behavior as in MySQL 4.1). This variable was added in MySQL 5.1.15. Explanation: If this parameter is turned off or does not exist, only the last query of the transaction will be rolled back if a timeout occurs. If it is turned on, the entire transaction will be rolled back if a timeout occurs. 3. How to reduce the probability of innodb deadlock? Deadlocks are difficult to completely eliminate in row lock and transaction scenarios, but lock conflicts and deadlocks can be reduced through table design and SQL adjustment, including: Try to use a lower isolation level. For example, if a gap lock occurs, you can change the transaction isolation level of the session or transaction to RC (read committed) to avoid it. However, you need to set binlog_format to row or mixed format. Carefully design indexes and use indexes to access data as much as possible to make locking more precise and reduce the chance of lock conflicts; Choose a reasonable transaction size. Small transactions are less likely to have lock conflicts. When explicitly locking a recordset, it is best to request a sufficient level of locks at one time. For example, if you want to modify data, it is best to apply for an exclusive lock directly, rather than applying for a shared lock first and then requesting an exclusive lock when modifying, which is prone to deadlock; When different programs access a set of tables, they should try to agree to access the tables in the same order. For a table, try to access the rows in a fixed order. This can greatly reduce the chance of deadlock; Try to access data with equal conditions to avoid the impact of gap locks on concurrent insertions; Do not apply for a lock level higher than what is actually needed; do not display locks when querying unless necessary; For some specific transactions, table locks can be used to increase processing speed or reduce the possibility of deadlock. Case 2: Weird Lock wait timeout For several consecutive days, a task failed at 6 a.m. and 8 a.m. respectively. When loading data local infile, a Java SQL exception "Lock wait timeout exceeded try restarting transaction innodb" was reported. After communicating with the colleagues on the platform, we learned that this was because the Lock time of our own business database was too short or there was a lock conflict. But when I think back, shouldn’t I? Hasn't this always been fine? Moreover, they are basically single-form tasks, and there are no multi-person conflicts. Regardless of whose problem it is, let's first check if there is any problem with our own database: The default lock timeout is 50s, which is really not short. I guess it would be useless to adjust it. In fact, I tried it as a last resort and it didn't work. . . And this time, SHOW ENGINE INNODB STATUS\G did not show any deadlock information. Then I turned my attention to the MySQL-server log, hoping to see from the log what operations the data was performing before and after that moment. Here is a brief introduction to the composition of the MySQL log file system: (a) Error log: records problems that occur when starting, running, or stopping mysqld. It is enabled by default. From the above introduction, we can see that the logs of this problem may be in d and b. After checking, we found that there is no log in d, so we can only enable b. However, b has a certain loss on the performance of the database. Since it is a full log, the volume is very large, so we must be cautious when enabling it: I just turned on the full log half an hour before and after the problem every day, and I didn't find any MySQL-client requests to our business database! The log format is as follows, recording all connections and commands: The problem was basically confirmed. The above exception was thrown before the client request reached us. After repeated communication and confirmation with the platform, the platform finally verified that it was because they needed to retrieve the SQL from the SQL task table and update the task status before executing the insertion. As a result, this table had a large number of concurrent inserts and updates at the hour, causing some SQLs to time out while waiting for the lock. . . MySQL log analysis script Since the early morning is the peak business period for data warehouses, many problems occur at this time. Some strange problems are often gone forever and cannot be reproduced during the day. How to capture the logs we care about to quickly locate the problem is of utmost importance. Here I wrote a small script for crontab deployment. You can choose a time range to start it and sample the log once a minute. It should be noted that general log should not be easily enabled, otherwise it will cause a large loss in database performance. ![]() ![]() ![]() The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: Detailed explanation of react setState
>>: Example of how to quickly build a LEMP environment with Docker
need Recently, we need to migrate Node online ser...
Table of contents Phenomenon Root Cause Analysis ...
1. mpstat command 1.1 Command Format mpstat [ -A ...
Click here to return to the 123WORDPRESS.COM HTML ...
1. Download mysql-5.7.17-winx64.zip; Link: https:...
Table of contents 1. Overview of Docker consul 2....
Everyone has played the pinball and brick-breakin...
A jQuery plugin every day - jQuery plugin to impl...
<meta name="viewport" content="w...
I searched for many ways to change it online but ...
1. Multiple borders[1] Background: box-shadow, ou...
Preface Recently, a data was operated incorrectly...
Table of contents Step 1: Install node_modules in...
1. Version Information # cat /etc/system-release ...
Now, let me ask you a question. What happens when...