Automatic failover of slave nodes in replication architecture in MySQL 8.0.23

Automatic failover of slave nodes in replication architecture in MySQL 8.0.23

I have been in contact with MGR for some time. With the arrival of MySQL 8.0.23, the high-availability architecture based on MySQL Group Replicaion (MGR) provides new architectural ideas.

How can the slave in the disaster recovery room better support the MGR in the main computer room?

How many nodes can MGR fail?

This time I will briefly talk to you about some of the ideas and functions of MGR based on the above two questions.

1. Fault tolerance of the number of MySQL Group Relication members

I believe that everyone is familiar with the table above. I often ask in interviews: "How many MGR nodes of 4 nodes can fail at most?" Most people answer: "At most one node can fail. If two nodes fail, the brain will split and the system will not work."

Then let’s take a look at how MGR handles it. Is this the answer?

1) We have a 4-node MGR

Here is a question: This diagram is obviously in Single mode, but the arrow is not unidirectional. Is it drawn wrong?

2) At this time, Second-04 suddenly crashes, so what will happen to the MGR cluster?

The cluster status will become:

  • Each node will exchange its own information at a fixed time.
  • When no information from the Second-04 node is received, other members will wait for 5 seconds.
  • During this period, Second-04 definitely did not send any messages, so the healthy members considered Second-04 to be in a suspicious state and marked it as UNREACHABLE.
  • Then the healthy member continues to wait according to the parameter: group_replication_member_expel_timeout (at this time Second-04 is still in UNREACHABLE status).
  • When the group_replication_member_expel_timeout time is exceeded, the healthy member expels the Second-04 node from the cluster.

Now here comes the point, pay attention to the blackboard

In Second-04, when not expelled:

At this time, the cluster is (4 nodes - 3 healthy - 1 bad). If one node fails during this period, the cluster becomes (4 nodes - 2 healthy - 2 bad). The cluster does not meet the majority principle, and each node cannot be written (unless manual intervention is performed to forcibly specify the cluster member list).

In Second-04, after being ejected:

At this time, the cluster is (3 nodes - 3 healthy - 0 bad). The 4-node cluster degenerates into a 3-node healthy cluster. At this time, the cluster can still continue to fail a node and become (3 nodes - 2 healthy - 1 bad).

Therefore, whether one or two nodes in a 4-node cluster can fail depends on the stage of the cluster processing.

PS:

Let’s talk about the problem we just mentioned: This picture is obviously Single mode, but the arrow is not unidirectional. Is it drawn wrong?

First, in Single mode, the Second node cannot be written by default, but this is only because the super-read-only setting of the Second node is enabled.

Set the Second node super-read-only = 0. The Second node can be written normally and can synchronize with other nodes (Primary and other Second nodes). The transmission is still based on the Paxos protocol.

Run a train: The Second node reversely synchronizes other nodes without going through the conflict detection phase (the theoretical efficiency is higher than the multi-write mode). It has not been verified, so you can study it if you are interested.

2. Asynchronous Connection Failover

MySQL 8.0.22 introduced asynchronous replication connection failover. Many friends have published articles to introduce it. Here I will just briefly describe it:

1) One master and one slave in the same computer room, and a slave node in a remote computer room

2) Master fails, Slave-01 becomes the Master, and Slave-02 cannot connect to the original Master

3) If "asynchronous connection failover configuration" is configured for Slave-02, then after Slave-02 recognizes the failure of the original Master, it will automatically try to establish a replication relationship with the original Slave-01 (new Master) according to the pre-defined configuration:

This feature is very good. The referenced third-party tools (such as MHA's repair of master-slave relationships) can be replaced by MySQL native functions.

But after I finished testing, I had some doubts:

1. Doesn’t “asynchronous” replication failover support semi-synchronous architecture? It cannot ensure that data will not be lost, and it cannot completely replace MHA?
Answer: Actually, it supports enhanced semi-synchronization.

2. If the Master List for failover needs to be pre-configured, then if the architecture of computer room A changes, do we still need to maintain the nodes in computer room B?
A: Yes.

3. If computer room A is MGR, then the node (master) of MGR is abnormal, but the service is not shut down and can be accessed, then shouldn’t the node in computer room B be always connected?
A: Yes

Then, MySQL 8.0.23 was released, which brought this feature enhancement:

Slave can support MGR clusters and can dynamically identify MGR members to establish a Master-Slave relationship.

Finally, let's run a lap:

1) First we have a 3-node MGR cluster, version 8.0.22 (asynchronous connection failover works on the Slave's IO Thread, so the Slave version is 8.0.23)

+----------------------------+-------------+--------------+-------------+---------------------+
| now(6) | member_host | member_state | member_role | VIEW_ID |
+----------------------------+-------------+--------------+-------------+---------------------+
| 2021-01-22 13:41:27.902251 | mysql-01 | ONLINE | SECONDARY | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-02 | ONLINE | PRIMARY | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-03 | ONLINE | SECONDARY | 16112906030396799:9 |
+----------------------------+-------------+--------------+-------------+---------------------+

2) Then we specify the "failover list for Master connection" on the independent Slave node

SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1', 'mysql-02', 3306, '', 80, 60);

Briefly explain the parameters:
ch1: chanel name GroupReplication: a hard-coded parameter. Currently supports MGR clusters aaaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1: MGR group name (parameter group_replication_group_name)
mysql-02: One of the MGR members 80: The priority of the Primary node (0-100). If there are multiple masters with the same priority, a node is randomly selected to serve as the master.
60: The priority of the Second node (0-100), which is basically prepared for Single mode

3) Specify replication channel information for Slave

CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='123456', SOURCE_HOST='mysql-02',SOURCE_PORT=3306,SOURCE_RETRY_COUNT=2,SOURCE_CONNECTION_AUTO_FAILOVER=1,SOURCE_AUTO_POSITION=1 For CHANNEL 'ch1';

4) Start the slave and check the "Connected Transferable List"

If the io thread is not enabled, the MGR members will not be automatically recognized. And copy the user

rpl_user needs to have select permissions on performance_schema on the MGR node

start slave;
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1 | mysql-01 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-02 | 3306 | | 80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-03 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

5) Then we stop group_replication in mysql-02 (not shutting down the service),

The slave list automatically eliminates mysql-02 and reconnects with other nodes -- mysql-02 (Primary):

stop group_replication;

-- Slave:
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1 | mysql-01 | 3306 | | 80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
| ch1 | mysql-03 | 3306 | | 60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

show slave status\G
*************************** 1. row ***************************
        Slave_IO_State: Waiting for master to send event
         Master_Host: mysql-01
         Master_User: rpl_user
         Master_Port: 3306
        Connect_Retry: 60
       Master_Log_File: mybinlog.000003
     Read_Master_Log_Pos: 4904
        Relay_Log_File:mysql-01-relay-bin-ch1.000065
        Relay_Log_Pos: 439
    Relay_Master_Log_File: mybinlog.000003
       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      ...

Now the configuration is complete. Later, when MGR nodes are added or removed, Slave can automatically maintain this list. No other use cases will be posted.

PS:

If you want to manually switch the connection of the Master node (Primary) established by the Slave to another node (Second), you only need to delete the "Transferable List of Replication Connections" and readjust the Second priority and add it back.

-- Delete configuration SELECT asynchronous_connection_failover_delete_managed('ch1', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaa1');


-- Re-add and adjust the priority of Second to be higher than Primary
SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaaaaaaa-aaaa-aaaa-aaaaaaaaaa1', 'mysql-03', 3306, '', 60, 80);

Reference Links:

https://mysqlhighavailability.com/automatic-asynchronous-replication-connection-failover/

https://my.oschina.net/u/4591256/blog/4813037

https://dev.mysql.com/doc/refman/8.0/en/replication-functions-source-list.html

This is the end of this article about automatic failover of replication architecture slave nodes in MySQL 8.0.23. For more information about MySQL automatic failover, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Comprehensive analysis of MySql master-slave replication mechanism
  • Solution to MySQL replication failure caused by disk fullness
  • Detailed explanation of MySQL master-slave replication and read-write separation
  • How to copy MySQL table
  • MYSQL database GTID realizes master-slave replication (super convenient)
  • Implementation principle and configuration of MySql master-slave replication
  • A brief analysis of MySQL's WriteSet parallel replication
  • MySQL master-slave replication principle and points to note
  • How to dynamically modify the replication filter in mysql
  • A brief analysis of MySQL parallel replication
  • Analysis of three parameters of MySQL replication problem

<<:  A thorough analysis of HTML special characters

>>:  CSS example code for setting scroll bar style

Recommend

Several important MySQL variables

There are many MySQL variables, some of which are...

Issues with upgrading Python and installing Mongodb drivers under Centos

Check the Python version python -V If it is below...

Web Design Tutorial (8): Web Page Hierarchy and Space Design

<br />Previous article: Web Design Tutorial ...

HTML+CSS+jQuery imitates the search hot list tab effect with screenshots

Copy code The code is as follows: <!DOCTYPE ht...

Teach you how to achieve vertical centering elegantly (recommended)

Preface There are many ways to center horizontall...

How to modify the ssh port number in Centos8 environment

Table of contents Preface start Preface The defau...

Detailed explanation of how to view the number of MySQL server threads

This article uses an example to describe how to v...

Exploration of three underlying mechanisms of React global state management

Table of contents Preface props context state Sum...

Implementation of vue-nuxt login authentication

Table of contents introduce Link start Continue t...

How to manage cached pages in Vue

Table of contents Problem 1: Destruction 1. How t...

JS realizes the automatic playback effect of pictures

This article shares the specific code of JS to ac...

Detailed explanation of CSS3 rotating cube problem

3D coordinate concept When an element rotates, it...