MySQL master-slave synchronization mechanism and synchronization delay problem tracking process

MySQL master-slave synchronization mechanism and synchronization delay problem tracking process

Preface

As a DBA, you will often encounter some MySQL master-slave synchronization delay problems at work. These slow synchronization problems actually have many reasons, which may be caused by network problems between the master and the slave, network bandwidth problems, large transactions, or delays caused by single-threaded replication.

I encountered a problem today. Mysql kept reporting errors and the master-slave synchronization delay was too large or wrong. So this article shares with you the mechanism principle of master-slave synchronization and troubleshooting ideas.

Fault manifestation

The most intuitive performance is:

mysql> show slave status\G;
 // State 1 Seconds_Behind_Master: NULL
 // State 2 Seconds_Behind_Master: 0
 // State 3 Seconds_Behind_Master: 79

In continuous queries, the attribute value is 0 most of the time, and occasionally a delayed value such as Null or 79 appears. This causes the monitoring of master-slave synchronization delay to continue to alarm.

Causes and solutions

The server-ids of multiple backup servers are the same, which causes the host to be unable to connect to a backup server for a long time, and thus cannot be synchronized normally.

After modifying the server-id, restart the database to recover.

Master-slave synchronization mechanism

MySQL master-slave synchronization, also known as replication, is a built-in high-availability and high-performance cluster solution with the following main functions:

  • Data distribution: Synchronization does not require a large bandwidth and can replicate data in multiple data centers.
  • Read load balancing: Through server clusters, you can use GSLB (global load balancing) methods such as DNS polling and Linux LVS to reduce the read pressure on the main server.
  • Database backup: Replication is part of backup, but it is not a replacement for backup. It also needs to be combined with snapshots.
  • High availability and failover: The slave server can quickly switch to the master server, reducing downtime and recovery time.

Master-slave synchronization is divided into 3 steps:

  1. The master server (master) records data changes in the binary log (binlog).
  2. The slave server copies the binary log of the master server to its own relay log.
  3. Redo the logs in the relay log from the server and apply the changes to your own database to achieve data consistency.

Master-slave synchronization is an asynchronous real-time synchronization, which transmits in real time, but there is a delay in execution. If the master server is under a lot of pressure, the delay will increase accordingly.

From the above figure, you can see that a total of 3 threads are required:

  1. The log transfer thread of the primary server: responsible for transferring binary log increments to the standby server
  2. The I/O thread of the slave server: responsible for reading the binary log of the master server and saving it as a relay log
  3. The SQL thread of the slave server is responsible for executing the relay log

View MySQL Threads

We can use show full processlist; command to view the status of MySQL:

Status of the host:

Status of the standby machine:

As you can see, my cluster architecture consists of 1 host and 4 standby machines, so there are 4 synchronization threads in the host (all binlog data has been sent to the standby machine, waiting for binlog log updates) and 1 viewing command thread (show full processlist). In the standby machine, there is 1 viewing command thread, 1 I/O thread (waiting for the master to send synchronization data events), and 1 SQL thread (which has read all relay logs and is waiting for the I/O thread to update it).

View Sync Status

Because master-slave synchronization is asynchronous and real-time, there will be delays. We can use show slave status; to view the synchronization delay on the standby machine:

Some properties that we need to pay attention to in master-slave synchronization have been marked in red:

  • Slave_IO_State: The state of the current I/O thread
  • Master_Log_File: The binary file of the currently synchronized master server
  • Read_Master_Log_Pos: The offset of the binary file of the master server currently synchronized, in bytes. As shown in the figure, 12.9M (13630580/1024/1024) has been synchronized.
  • Relay_Master_Log_File: Binary file of the current relay log synchronization
  • Slave_IO_Running: The running status of the I/O thread in the slave server. YES means it is running normally.
  • Slave_SQL_Running: The running status of the SQL thread in the slave server. YES means it is running normally.
  • Exec_Master_Log_Pos: indicates the binary log offset of the master server where synchronization is completed
  • Seconds_Behind_Master: Indicates the duration that the slave server data lags behind the master server

You can also use the show master status; command to view the running status of the master server:

Normal master-slave synchronization status:

Slave_IO_Running: YES
Slave_SQL_Running: YES
Seconds_Behind_Master: 0

Troubleshooting

After understanding the master-slave synchronization mechanism, let's look at the problem we encountered today. By checking the status of the standby machine, we observe several key attribute values ​​in three states:

mysql> show slave status\G;
#State 1:
 Slave_IO_State: Reconnecting after a failed master event read
 Slave_IO_Running: No
 Slave_SQL_Running: Yes
 Seconds_Behind_Master: NULL
#State 2:
 Slave_IO_State: Waiting for master to send event
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Seconds_Behind_Master: 0
#State three:
 Slave_IO_State: Queueing master event to the relay log
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Seconds_Behind_Master: 636

Through the state transition of the MySQL master-slave replication thread, we can see the different meanings of the three states:

# State 1# The thread is trying to reconnect to the master server. When the connection is reestablished, the state changes to Waiting for master to send event.
Reconnecting after a failed master event read
# State 2# The thread has connected to the primary server and is waiting for binary log events to arrive. If the primary server is idle, it may last longer. If the wait lasts for slave_read_timeout seconds, a timeout occurs. At this point, the thread considers the connection to be broken and attempts to reconnect.
Waiting for master to send event

# State three # The thread has read an event and is copying it to the relay log for the SQL thread to process.
Queueing master event to the relay log

Here, we can guess that for some reason, the slave server keeps disconnecting from the master server and trying to reconnect, and then disconnects again after the reconnection is successful.

Let's take a look at the host's operation:

We found that the problem occurred on two machines, 10.144.63.* and 10.144.68.*. We checked the error log of one of them:

190214 11:33:20 [Note] Slave: received end packet from server, apparent master shutdown:
190214 11:33:20 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.005682' at postion 13628070

After searching on Google for the keyword Slave: received end packet from server, apparent master shutdown:, you can see in the article Confusing MySQL Replication Error Message that the cause is that the server-ids of the two standby servers are duplicated.

One day it happened to me, and took me almost an hour to find that out.
Moving foward I always use a base my.cnf to I copy to any other server and the first thing is to increase the server-id.
Could MySQL just use the servername intead of a numeric value?

Bug fixes

After locating the problem, we confirmed whether there was a duplication and found that the fields on the two backup machines were indeed the same:

vim my.cnf

#replication
log-bin=mysql-bin
# This random number is the same as server-id=177230069
sync_binlog=1

Change a different number, save, restart the MySQL process, and the alarm will be restored.

Summarize

In the end, the solution to this problem is very simple, but the change from confusion at the beginning to clarity of ideas at the end is common when we troubleshoot problems. The main benefit of this article is to let you understand the mechanism of master-slave synchronization and the ideas for tracing problems. I hope that next time we can quickly solve the problems brought to us by master-slave synchronization.

Well, the above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM.

References

  • "MySQL Basics: InnoDB Storage Engine 2nd Edition" P8.7 Copy
  • MySQL master-slave replication thread state change
  • Confusing MySQL Replication Error Message
You may also be interested in:
  • MySQL master-slave synchronization principle and application
  • Master-slave synchronization configuration of Mysql database
  • This article will show you the principle of MySQL master-slave synchronization
  • The implementation principle of Mysql master-slave synchronization
  • Detailed explanation of Mysql master-slave synchronization configuration practice
  • How to set up master-slave synchronization in MYSQL database

<<:  Use nginx to configure domain name-based virtual hosts

>>:  Easyswoole one-click installation script and pagoda installation error

Recommend

Vue realizes the function of uploading photos on PC

This article example shares the specific code of ...

Experience in solving tomcat memory overflow problem

Some time ago, I submitted a product version to t...

UDP connection object principle analysis and usage examples

I wrote a simple UDP server and client example be...

Let's talk about bitwise operations in React source code in detail

Table of contents Preface Several common bit oper...

Detailed explanation of the use of Vue image drag and drop zoom component

The specific usage of the Vue image drag and drop...

How to deal with too many Docker logs causing the disk to fill up

I have a server with multiple docker containers d...

How to install Solr 8.6.2 in Docker and configure the Chinese word segmenter

1. Environment version Docker version 19.03.12 ce...

MySQL REVOKE to delete user permissions

In MySQL, you can use the REVOKE statement to rem...

Copy fields between different tables in MySQL

Sometimes, we need to copy a whole column of data...

Simply learn various SQL joins

The SQL JOIN clause is used to join rows from two...

Minimalistic website design examples

Web Application Class 1. DownForEveryoneOrJustMe ...

What is BFC? How to clear floats using CSS pseudo elements

BFC Concept: The block formatting context is an i...

Detailed process of deploying Docker to WSL2 in IDEA

The local environment is Windows 10 + WSL2 (Ubunt...