Solution to MySQL replication failure caused by disk fullness

Solution to MySQL replication failure caused by disk fullness

Case scenario

A problem was discovered online today. Due to the lack of monitoring coverage, the disk of a certain machine was full, causing problems with the online MySQL master-slave replication. The problem is as follows:

localhost.(none)>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: 10.xx.xx.xx
                  Master_User: replica
                  Master_Port: 5511
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: relay-bin.001605
                Relay_Log_Pos: 9489761
        Relay_Master_Log_File:
             Slave_IO_Running: No
            Slave_SQL_Running: No
                   Last_Errno: 13121
                   Last_Error: Relay log read failure: Could not parse relay log event entry.
 The possible reasons are: the master's binary log is corrupted (you can check this by running
 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by
 running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a
 keyring key required to open an encrypted relay log file, or a bug in the master's or
 slave's MySQL code. If you want to check the master's binary log or slave's relay log,
 you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

So I checked the error log and found the following content in the error log:

2021-03-31T11:34:39.367173+08:00 11 [Warning] [MY-010897] [Repl] Storing MySQL user name or
 The password information in the master info repository is not secure and is therefore not
 recommended. Please consider using the USER and PASSWORD connection options for START SLAVE;
 see the 'START SLAVE Syntax' in the MySQL Manual for more information.

2021-03-31T11:34:39.368161+08:00 12 [ERROR] [MY-010596] [Repl] Error reading relay log
 event for channel '': binlog truncated in the middle of event; consider out of disk space

2021-03-31T11:34:39.368191+08:00 12 [ERROR] [MY-013121] [Repl] Slave SQL for channel '': Relay
 log read failure: Could not parse relay log event entry. The possible reasons are: the master's
 The binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log).
 The slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log),
 a network problem, the server was unable to fetch a keyring key required to open an encrypted
 relay log file, or a bug in the master's or slave's MySQL code. If you want to check the
 The master's binary log or the slave's relay log, you will be able to know their names by issuing 'SHOW
 SLAVE STATUS' on this slave. Error_code: MY-013121

2021-03-31T11:34:39.368205+08:00 12 [ERROR] [MY-010586] [Repl] Error running query, slave SQL
 thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We
 stopped at log 'mysql-bin.000446' position 9489626

As you can see from the description, the error log is quite intelligent. It found the disk problem and prompted us to "consider out of disk space".

Solving the problem

After logging into the server, I quickly discovered that the disk usage of the server where MySQL is located had reached 100%. The cause of the problem was consistent with the content in the error log.

Solve this problem now. The basic idea is to clean up the disk files and then rebuild the replication relationship. This process seems to be relatively simple, but in actual operation, the following error occurs when building the replication relationship:

### Based on gtid replication, I want to rebuild the replication relationship localhost.(none)>reset slave;
ERROR 1371 (HY000): Failed purging old relay logs: Failed during log reset

localhost.(none)>reset slave all;
ERROR 1371 (HY000): Failed purging old relay logs: Failed during log reset

Step 1: Because replication is based on gtid, after directly recording the status of show slave status, you can reset the slave and use the change master statement to rebuild the replication relationship.

However, the above error message appears. From the error message, it seems that MySQL cannot complete the purge relay log operation, which does not seem scientific. Well, since you can't complete the purge relay logs operation by yourself, let me help you.

Step 2: Manually delete all relay logs with rm -f, and find that the error message becomes:

localhost.(none)>reset slave all;
ERROR 1374 (HY000): I/O error reading log index file

Well, okay, the problem was not solved.

Then I thought about it. Since I couldn't clean up the relay log by manually resetting the slave, I just stopped it.

Is it possible to change from slave to master?

Step 3: Stop slave directly, then change master, without executing the reset slave all statement. The result is as follows:

localhost.(none)>change master to master_host='10.13.224.31',
    -> master_user='replica',
    -> master_password = 'eHnNCaQE3ND',
    -> master_port=5510,
    -> master_auto_position=1;
ERROR 1371 (HY000): Failed purging old relay logs: Failed during log reset

Well, the problem remains.

Step 4: Anyway, the replication has been disconnected with an error, so execute start slave to see what happens. As a result, a dramatic scene appears:

localhost.(none)>start slave;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 262
Current database: *** NONE ***


Query OK, 0 rows affected (0.01 sec)


localhost.(none)>
[root@ ~]#

After executing start slave, the instance hangs directly.

At this point, replication is completely disconnected and the slave instance has crashed.

Step 5: Check if the instance can be restarted. Try to restart the instance and find that the instance can be started again. After the instance is restarted, check the replication relationship and the results are as follows:

localhost.(none)>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: 10.xx.xx.xx
                  Master_User: replica
                  Master_Port: 5511
                Connect_Retry: 60
              Master_Log_File:
           Read_Master_Log_Pos: 4
               Relay_Log_File: relay-bin.001605
                Relay_Log_Pos: 9489761
        Relay_Master_Log_File:
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                   Last_Errno: 13121
                   Last_Error: Relay log read failure: Could not parse relay log event entry.
 The possible reasons are: the master's binary log is corrupted (you can check this by running
 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by
 running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a
 keyring key required to open an encrypted relay log file, or a bug in the master's or slave's
 MySQL code. If you want to check the master's binary log or slave's relay log, you will be able
 to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0

Copying the relationship still results in an error.

Step 6: Reset slave all and see if it succeeds.

localhost.(none)>stop slave;
Query OK, 0 rows affected (0.00 sec)


localhost.(none)>reset slave all;
Query OK, 0 rows affected (0.03 sec)

Step 7: Re-establish the replication relationship and start replication

localhost.(none)>change master to master_host='10.xx.xx.xx',
    -> master_user='replica',
    -> master_password='xxxxx',
    -> master_port=5511,
    -> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.01 sec)


localhost.(none)>start slave;
Query OK, 0 rows affected (0.00 sec)


localhost.(none)>show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.xx.xx.xx
                  Master_User: replica
                  Master_Port: 5511
                Connect_Retry: 60
                          ...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

It is found that the replication relationship of the instance can be established.

Summary

When the disk is full, the MySQL service cannot write data to the meta-information table, and the relay log may be incomplete. If you directly clean up the disk data on the server and then change the master again to modify the master-slave replication relationship, an error may occur and it cannot be repaired directly because this is not a normal master-slave replication relationship break scenario.

So, the correct approach should be:

1. Clean the server's disk

2. Restart the slave library whose replication relationship is disconnected

3. Reset slave all and change master to build a master-slave replication relationship

If there is a better way, please let me know.

The above is the detailed content of the solution to the problem of MySQL replication failure caused by disk full. For more information about the solution to MySQL replication failure, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Comprehensive analysis of MySql master-slave replication mechanism
  • Detailed explanation of MySQL master-slave replication and read-write separation
  • How to copy MySQL table
  • Automatic failover of slave nodes in replication architecture in MySQL 8.0.23
  • MYSQL database GTID realizes master-slave replication (super convenient)
  • Implementation principle and configuration of MySql master-slave replication
  • A brief analysis of MySQL's WriteSet parallel replication
  • MySQL master-slave replication principle and points to note
  • How to dynamically modify the replication filter in mysql
  • A brief analysis of MySQL parallel replication
  • Analysis of three parameters of MySQL replication problem

<<:  CSS -webkit-box-orient: vertical property lost after compilation

>>:  Solve the problem that Docker must use sudo operations

Recommend

About the problem of vertical centering of img and span in div

As shown below: XML/HTML CodeCopy content to clip...

MySQL 5.7.18 installation and configuration method graphic tutorial (CentOS7)

How to install MySQL 5.7.18 on Linux 1. Download ...

How to configure SSL for koa2 service

I. Introduction 1: SSL Certificate My domain name...

Simple summary of tomcat performance optimization methods

Tomcat itself optimization Tomcat Memory Optimiza...

Convert XHTML CSS pages to printer pages

In the past, creating a printer-friendly version ...

Implementation of React virtual list

Table of contents 1. Background 2. What is a virt...

Super detailed tutorial to implement Vue bottom navigation bar TabBar

Table of contents Project Introduction: Project D...

Example test MySQL enum type

When developing a project, you will often encount...

MySQL Workbench download and use tutorial detailed explanation

1. Download MySQL Workbench Workbench is a graphi...

vue-router history mode server-side configuration process record

history route History mode refers to the mode of ...

Handwritten Vue2.0 data hijacking example

Table of contents 1: Build webpack 2. Data hijack...

How to optimize MySQL performance through MySQL slow query

As the number of visits increases, the pressure o...