How does MySQL ensure master-slave consistency?

How does MySQL ensure master-slave consistency?

Question: As we all know, binlog can be used for archiving and master-slave synchronization, but what is its content? Why can the standby database be consistent with the primary database after executing binlog?

The basic principle of MySQL master-slave

insert image description here

Figure 1 MySQL master-slave switch process

In state 1, the client's read and write operations directly access node A, and node B is the backup database of A. It only synchronizes all updates of A and executes them locally. This keeps the data on nodes B and A the same.
When switching is required, switch to state 2. At this time, the client reads and writes to node B, and node A is the backup database of B.
In state 1, although node B is not directly accessed, it is recommended that you set node B (that is, the standby database) to read-only mode . There are several considerations for doing this:

  • Sometimes some operational query statements will be put on the standby database for query. Setting it to read-only can prevent misoperation.
  • Prevent bugs in the switching logic , such as double writing during the switching process, which causes inconsistency between the primary and standby .
  • The readonly state can be used to determine the role of a node .

I have set the backup database to read-only. How can I keep it updated synchronously with the primary database?
You don't have to worry about this problem. Because the readonly setting is invalid for users with super privileges, and the thread used for synchronous updates has super privileges.

insert image description here

Figure 2 Active/standby flow chart

After receiving the update request from the client, the master database executes the update logic of the internal transaction and writes the binlog at the same time. A long connection is maintained between the standby database B and the primary database A. There is a thread inside the master database A, which is dedicated to serving the long connection of the standby database B. The complete process of a transaction log synchronization is as follows:

  • On standby database B, use the change master command to set the IP, port, user name, password of master database A, and the position from which to start requesting binlog, which includes the file name and log offset.
  • Execute the start slave command on slave database B. The slave database will start two threads, namely io_thread and sql_thread in the figure. Among them, io_thread is responsible for establishing a connection with the main library.
  • After master database A verifies the user name and password, it starts reading the binlog from the local location transmitted by standby database B and sends it to B.
  • After receiving the binlog, standby database B writes it to a local file, called a relay log.
  • sql_thread reads the transfer log, parses the commands in the log, and executes them.

Later, with the introduction of multi-threaded replication solutions, sql_thread evolved into multiple threads .

Comparison of three binlog formats

Binlog has two formats, one is statement and the other is row. You may also see a third format called mixed in other materials. In fact, it is a mixture of the first two formats.

mysql> CREATE TABLE t (
	id int(11) NOT NULL,
	a int(11) DEFAULT NULL,
	t_modified timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
	PRIMARY KEY (id),
	KEY a (a),
	KEY t_modified(t_modified)
)ENGINE=InnoDB;

insert into t values(1,1,'2018-11-13');
insert into t values(2,2,'2018-11-12');
insert into t values(3,3,'2018-11-11');
insert into t values(4,4,'2018-11-10');
insert into t values(5,5,'2018-11-09');

If you want to delete a row of data in the table, let's see how the binlog of this delete statement is recorded.

mysql> delete from t /comment/ where a>=4 and t_modified<='2018-11-10' limit 1;

When binlog_format=statement, the original text of the SQL statement is recorded in binlog . You can use the mysql> show binlog events in 'master.000001'; command to view the contents of the binlog.

insert image description here

Figure 3. Example of statement format binlog

Now, let's look at the output in Figure 3.

  • You can ignore the first line SET @@SESSION.GTID_NEXT='ANONYMOUS' for now. We will mention it later in the article when introducing the active/standby switch.
  • The second line is a BEGIN, which corresponds to the commit in the fourth line, indicating that there is a transaction in between.
  • The third line is the actual execution statement. You can see that there is a "use 'test'" command before the actual delete command is executed. This command is not actively executed by us, but is added by MySQL based on the database where the table to be operated is located. Doing so ensures that when the log is transferred to the standby database for execution, no matter which database the current working thread is in, it can correctly update table t in the test database.
  • The delete statement after the use 'test' command is the original SQL text we entered. As you can see, binlog "faithfully" records the SQL commands and even the comments.
  • The last line is a COMMIT. You can see xid=61 written inside.

To illustrate the difference between statement and row formats, let's look at the execution effect diagram of this delete command:

insert image description here

Figure 4 delete execution warnings

Running this delete command generates a warning because the current binlog is set to statement format and there is a limit in the statement , so this command may be unsafe. Why do I say so? This is because the delete command has a limit, which may cause inconsistency between the primary and standby data .

If the delete statement uses index a, the first row that meets the condition will be found based on index a, that is, the row with a=4 will be deleted;
But if the index t_modified is used, then the row with t_modified='2018-11-09', that is, a=5, will be deleted.

Because the original text of the statement is recorded in the binlog in the statement format, the following situation may occur: when the SQL statement is executed on the primary database, index a is used; but when the SQL statement is executed on the standby database, index t_modified is used. Therefore, MySQL considers it risky to write like this .

If I change the binlog format to binlog_format='row', will this problem go away?

insert image description here

Figure 5 Example of row format binlog

Compared with the statement format binlog, the BEGIN and COMMIT before and after are the same. However, the original text of the SQL statement is no longer in the row-format binlog. Instead, it is replaced by two events: Table_map and Delete_rows.

  • Table_map event, used to indicate that the next table to be operated is table t in the test database;
  • Delete_rows event, used to define the behavior of deletion.

With the help of mysqlbinlog tool, use the following command to parse and view the contents of binlog. The binlog of this transaction starts at position 8900, so the start-position parameter can be used to specify that the log parsing starts from this position.

mysqlbinlog -vv data/master.000001 --start-position=8900;

insert image description here

Figure 6 Details of a row format binlog example

From this figure, we can see the following information:

  • Server id 1 means that this transaction is executed on the database with server_id=1.
  • Each event has a CRC32 value, this is because I set the parameter binlog_checksum to CRC32.
  • The table_map event is the same as that seen in Figure 5, showing the next table to be opened, mapped to number 226. Now our SQL statement only operates one table, what if we want to operate multiple tables? Each table has a corresponding Table_map event, which is mapped to a separate number to distinguish operations on different tables.
  • We used the -vv parameter in the mysqlbinlog command to parse out all the contents, so that we can see the values ​​of each field (for example, @1=4, @2=4) from the result.
  • The default configuration of binlog_row_image is FULL, so Delete_event contains the values ​​of all fields of the deleted row. If binlog_row_image is set to MINIMAL, only necessary information will be recorded. In this example, only the information about id=4 will be recorded.

The last Xid event is used to indicate that the transaction was successfully committed.

When binlog_format uses the row format, the primary key ID of the actual deleted row is recorded in the binlog. In this way, when the binlog is transferred to the slave database, the row with id=4 will definitely be deleted, and there will be no problem of the master and slave deleting different rows .

Why is there a mixed format binlog?

Because some statement-format binlogs may cause inconsistencies between the master and slave, the row format should be used .

But the disadvantage of the row format is that it takes up a lot of space . For example, if you use a delete statement to delete 100,000 rows of data , a SQL statement will be recorded in the binlog, taking up dozens of bytes of space. But if you use binlog in row format, you need to write all 100,000 records into binlog. Doing so will not only take up more space, but also consume IO resources to write binlog, affecting the execution speed .

Therefore, MySQL took a compromise solution, that is, there is a mixed format binlog. The mixed format means that MySQL will determine whether this SQL statement may cause inconsistency between the master and the slave. If possible, it will use the row format; otherwise, it will use the statement format . In other words, the mixed format can take advantage of the advantages of the statment format while avoiding the risk of data inconsistency.

Now more and more scenarios require that the MySQL binlog format be set to row . There are many reasons to do this, here's one that's immediately obvious: data recovery .

We will look at the data recovery issue from the perspective of three SQL statements: delete, insert, and update.

  • As you can see from Figure 6, even if I execute a delete statement, the row format binlog will save the entire row information of the deleted row. Therefore, if you find that you have deleted the wrong data after executing a delete statement, you can directly convert the delete statement recorded in the binlog into an insert, and then restore the deleted data by inserting it back.
  • What if you execute the wrong insert statement? That's more direct. In row format, all field information is recorded in the binlog of the insert statement, which can be used to accurately locate the row that has just been inserted. At this time, you can simply convert the insert statement into a delete statement and delete the row of data that was inserted by mistake.
  • If an update statement is executed, the binlog will record the entire row of data before and after the modification. Therefore, if you execute an update statement by mistake, you only need to swap the two lines of information before and after the event and then execute it in the database to restore the update operation.

Data operation errors caused by delete, insert or update statements often require recovery to the state before the operation. MariaDB's Flashback tool rolls back data based on the principles described above.

mysql> insert into t values(10,10,now());

If you set the binlog format to mixed, do you think MySQL will record it in row format or statement format?

insert image description here

Figure 7 mixed format and now()

MySQL actually uses the statement format. Next, let's use the mysqlbinlog tool to take a look:

insert image description here

Figure 8 TIMESTAMP command

It turns out that when binlog records the event, it records an extra command: SET TIMESTAMP=1546103491. It uses the SET TIMESTAMP command to agree on the return time of the subsequent now() function . Through this SET TIMESTAMP command, MySQL ensures the consistency of the primary and standby data.

When replaying binlog data , this is done: use mysqlbinlog to parse the log, and then directly copy the statements in it and execute them .
As you now know, this approach is risky. Because the execution results of some statements depend on the context commands, the results of direct execution are likely to be wrong .

Therefore, the standard way to use binlog to recover data is to parse it using the mysqlbinlog tool and then send the entire parsed result to MySQL for execution . A command similar to the following:

mysqlbinlog master.000001 --start-position=2738 --stop-position=2942 | mysql -h127.0.0.1 -P13000 -u$user -p$pwd;

This command means to parse out the content from byte 2738 to byte 2942 in the master.000001 file and put it into MySQL for execution.

Circular replication problem

The binlog feature ensures that executing the same binlog on the slave database can achieve the same status as the master database. We can assume that under normal circumstances the data of the primary and backup devices are consistent . That is to say, the contents of nodes A and B in Figure 1 are consistent. In fact, what I drew in Figure 1 is the MS structure, but the dual-M structure is more commonly used in actual production , which is the active-standby switching process shown in Figure 9.

insert image description here

Figure 9 MySQL master-slave switch process - dual M structure

Nodes A and B are always in a master-slave relationship with each other. This way, there is no need to modify the master-slave relationship during switching.

However, the double-M structure still has a problem to be solved .

The business logic updates a statement on node A, and then sends the generated binlog to node B. Node B also generates a binlog after executing the update statement. (I suggest you set the parameter log_slave_updates to on, which means that the standby database generates binlog after executing the relay log).

Then, if node A is also the backup database of node B, it is equivalent to taking the newly generated binlog of node B and executing it again. Then, this update statement will be executed repeatedly between nodes A and B, which is circular replication. How to solve this problem ?

As can be seen in Figure 6 above, MySQL records the server id of the instance where this command was first executed in the binlog . Therefore, we can use the following logic to solve the problem of circular replication between two nodes:

  • It is stipulated that the server IDs of the two databases must be different . If they are the same, they cannot be set as the primary and backup relationship.
  • A standby database receives the binlog and generates a new binlog with the same server id as the original binlog during the replay process ;
  • After receiving the log sent from its main database, each database first determines the server id . If it is the same as its own, it means that the log was generated by itself, and it will directly discard the log.

According to this logic, if we set up a double M structure, the execution flow of the log will become like this:

  • For transactions updated from node A, the server ID of A is recorded in the binlog.
  • After being transmitted to node B and executed once, the server id of the binlog generated by node B is also the server id of A;
  • Then it is transmitted back to node A. When A determines that the server ID is the same as its own, it will no longer process the log. Therefore, the infinite loop is broken here.

Summarize:

This concludes this article on how MySQL ensures consistency between the primary and backup servers. For more information on how MySQL ensures consistency between the primary and backup servers, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Repair solution for inconsistent MySQL GTID master and slave

<<:  Pure CSS to adjust Div height according to adaptive width (percentage)

>>:  N ways to vertically center a div with single/multi-line text (unknown height/fixed height)

Recommend

Tutorial on using Multitail command on Linux

MultiTail is a software used to monitor multiple ...

Detailed explanation of MySQL high availability architecture

Table of contents introduction MySQL High Availab...

Methods and steps to build nginx file server based on docker

1. Create a new configuration file docker_nginx.c...

Analysis of log files in the tomcat logs directory (summary)

Each time tomcat is started, the following log fi...

How to deploy MySQL and Redis services using Docker

Table of contents How to deploy MySQL service usi...

Detailed explanation of MySQL's MERGE storage engine

The MERGE storage engine treats a group of MyISAM...

9 great JavaScript framework scripts for drawing charts on the web

9 great JavaScript framework scripts for drawing ...

Detailed explanation of importing/exporting MySQL data in Docker container

Preface We all know that the import and export of...

Linux Jenkins configuration salve node implementation process diagram

Preface: Jenkins' Master-Slave distributed ar...

A brief discussion on several advantages of Vue3

Table of contents 1. Source code 1.1 Monorepo 1.2...

JS uses clip-path to implement dynamic area clipping function

background Today, I was browsing CodePen and saw ...

nginx proxy_cache batch cache clearing script introduction

Preface: I used the official nginx proxy_cache as...