MySQL master-slave replication delay causes and solutions

MySQL master-slave replication delay causes and solutions

Source: Public account "Oracle's Shadow Gallery"

In an asynchronous or semi-synchronous replication structure, it is normal for slaves to experience delays.
Although delays are normal, whether they require attention is generally assessed by the business.
For example, if there is a read business on the slave database that requires high consistency and the delay is required to be less than a certain value, then attention should be paid.

A brief overview of the replication logic:

1. The master database records changes to the database instance in the binlog.
2. Master has sent all binlog to slave; waiting for more updates a binlog dump thread to monitor the changes in binlog in real time and push these new events to the slave database (Master has sent all binlog to slave; waiting for more updates)
3. Receive these events from the library's IO Thread and record them in the relaylog.
4. The slave SQL Thread reads the events of the relaylog and applies (or replays) these events to the slave instance.

The above is the default asynchronous replication logic. Semi-synchronous replication is slightly different and will not be described here.

In addition, it is very easy to determine whether there is a delay in the slave library:
On the slave database, use SHOW SLAVE STATUS
Just check Seconds_Behind_Master value.

Causes of delay and solutions

〇Frequent DML requests to the main database (large tps)

That is, the main database has many write requests, with a large number of concurrent insert, delete, and update operations, which generates a large amount of binlog in a short period of time.

【Cause Analysis】

The master database writes data concurrently, while the slave database SQL Thread applies logs in a single thread, which can easily cause relaylog accumulation and delays.

【Solution】

Do sharding and disperse write requests by scaling out. Or consider upgrading to MySQL 5.7+ and enabling parallel replication based on logical clocks.

〇The main database executes large transactions

For example, importing a large amount of data, INSERT INTO $tb1 SELECT * FROM $tb2、LOAD DATA INFILE , etc. For example, UPDATE , DELETE the entire table, etc.
Exec_Master_Log_Pos has not changed, Slave_SQL_Running_State is Reading event from the relay log
Analyze the main database binlog and see the transactions currently executed by the main database.

【Cause Analysis】

If the master database takes 200 seconds to update a large table, and the master and slave databases have similar configurations, the slave database will also need to spend almost the same amount of time to update the large table. At this time, slave database delays begin to accumulate, and subsequent events cannot be updated.

【Solution】

Split large transactions and submit them in time.

〇The main database executes DDL statements on large tables

The phenomenon is similar to that of executing large transactions in the main database.
Check that Exec_Master_Log_Pos has not moved, which may be due to the execution of DDL.
Analyze the main database binlog and see the transactions currently executed by the main database.

【Cause Analysis】

1. DDL is not started and is blocked. SHOW SLAVE STATUS shows that Slave_SQL_Running_State is waiting for table metadata lock and Exec_Master_Log_Pos remains unchanged.
2. DDL is being executed, and SQL Thread single-threaded application causes increased latency. Slave_SQL_Running_State is altering table , Exec_Master_Log_Pos remains unchanged

【Solution】

Use processlist or information_schema.innodb_trx to find the query that blocks the DDL statement, kill the query, and allow the DDL to execute normally in the slave database.
The delay caused by DDL itself is difficult to avoid. It is recommended to consider:
① Execute during off-peak hours ② After set sql_log_bin=0 , manually execute DDL on the master and slave databases respectively (this operation may cause data inconsistency for some DDL operations, so be sure to test strictly)

〇The configuration of the master library and the slave library is inconsistent:

【Cause Analysis】

Hardware: The master instance server uses SSD, while the slave instance server uses ordinary SAS disk, and the CPU main frequency is inconsistent. Configuration: such as inconsistent RAID card write strategy, inconsistent OS kernel parameter settings, inconsistent MySQL disk placement strategy, etc.

【Solution】

Try to unify the configuration of DB machines (including hardware and option parameters)
Even for some OLAP businesses, the hardware configuration of the slave instance is higher than that of the master instance.

〇The table lacks a primary key or unique index

When binlog_format=row , if the table lacks a primary key or unique index, the slave database latency may increase sharply during UPDATE and DELETE .
At this time, Slave_SQL_Running_State is Reading event from the relay log .
And the table SHOW OPEN TABLES WHERE in_use=1 always exists.
Exec_Master_Log_Pos remains unchanged.
The CPU usage of the mysqld process is almost 100% (when there is no read traffic), and the IO pressure is not great.

【Cause Analysis】

Let's assume an extreme case where the master database updates 200,000 rows of data in a 5 million table. The update statement requires a full table scan. In row format, 200,000 update operations are recorded in the binlog. In this case, SQL Thread replay will be very slow, and each update may require a full table scan.

【Solution】

Check the table structure to ensure that each table has an explicit auto-increment primary key and create appropriate indexes.

〇The pressure from the library itself is too great

【Cause Analysis】

The slave database executes a large number of select requests, or most of the select requests of the business are routed to the slave database instance, even a large number of OLAP businesses, or the slave database is being backed up, etc.
At this time, the CPU load may be too high, the IO utilization may be too high, and the SQL Thread application may be too slow.

【Solution】

Create more slaves to distribute read requests and reduce the pressure on existing slave instances.

0MyISAM storage engine

At this time, the slave library Slave_SQL_Running_State is Waiting for table level lock

【Cause Analysis】

MyISAM only supports table-level locks, and reading and writing cannot be performed concurrently.
When the corresponding value of @@concurrent_insert is set, the master database can execute insert concurrently during select, but it cannot be concurrent during replay of the slave database SQL Thread . If you are interested, you can take a look at the implementation of MyISAM.

【Solution】

Of course, I choose to forgive it. Since I have chosen MyISAM, I should be mentally prepared. (There are other scenarios where MyISAM is not recommended for use in a replication structure.)
Change to InnoDB.

Summarize:

Use SHOW SLAVE STATUS and SHOW PROCESSLIST to view the current status of the slave library. (This can also be avoided when backing up from the database.)
If Exec_Master_Log_Pos does not change, consider large transactions, DDL, and no primary key, and check the binlog and position corresponding to the master database.
If Exec_Master_Log_Pos changes and the delay gradually increases, consider the load on the slave machine, such as io, cpu, etc., and consider whether the master write operation and the slave's own pressure are too high.

If none of the above reasons apply, please ask the DBA experts for help.

Of course, Seconds_Behind_Master is not necessarily accurate. In a few scenarios, although Seconds_Behind_Master is 0, the master-slave data is inconsistent.
That will be another blog post for another time.

The full text is over.

The above is the detailed content of the causes and solutions of MySQL master-slave replication delay. For more information about MySQL master-slave replication delay, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Analysis and solution of MYSQL master-slave asynchrony delay principle
  • Common solutions for Mysql read-write separation expiration
  • Share seven solutions for MySQL master-slave delay and read-write separation

<<:  Let's talk about Vue's mixin and inheritance in detail

>>:  Nginx forwarding based on URL parameters

Recommend

Viewing and analyzing MySQL execution status

When you feel that there is a problem with MySQL ...

SQL Get stored procedure return data process analysis

This article mainly introduces the analysis of th...

Installation method of mysql-8.0.17-winx64 under windows 10

1. Download from the official website and unzip h...

Detailed tutorial on installing Python 3.8.1 on Linux

This example takes the installation of Python 3.8...

Detailed process of drawing three-dimensional arrow lines using three.js

Demand: This demand is an urgent need! In a subwa...

Docker exposes port 2375, causing server attacks and solutions

I believe that students who have learned about th...

10 ways to view compressed file contents in Linux (summary)

Generally speaking, when we view the contents of ...

The difference between ID and Name attributes of HTML elements

Today I am a little confused about <a href=&quo...

SQL injection vulnerability process example and solution

Code example: public class JDBCDemo3 { public sta...

JavaScript regular verification password strength implementation method

exhibit design Password strength analysis The pas...

Introduction to Javascript DOM, nodes and element acquisition

Table of contents DOM node Element node: Text nod...

Nginx configuration SSL and WSS steps introduction

Table of contents Preface 1. Nginx installation 1...

Specific usage of CSS compound selectors

Intersection Selector The intersection selector i...