Summary of some thoughts on binlog optimization in MYSQL

Summary of some thoughts on binlog optimization in MYSQL

question

Question 1: How to solve the performance loss caused by flushing redo log when committing a transaction?

WAL is a common technology for achieving transaction persistence (D). The basic principle is to record transaction modifications in redo log. Redo log is written sequentially in append order. When a transaction is committed, you only need to ensure that the redo log of the transaction is written to the disk. By replacing random page writes with sequential redo log writes, the performance of the database system is improved. However, this solution requires that the redo log generated by each transaction be flushed to disk once when it is committed, which is inefficient.

Question 2: The order of committing binlog and engine-level transactions

For a single transaction, the order of log writing is redo log first and then binlog. As long as this order is maintained, correctness can be maintained. But for a highly concurrent database system, there may be many concurrently executed transactions at all times. We also need to use certain means to maintain the sequential consistency of server-layer binlog and engine-layer transaction submission.

Maintaining this sequential consistency is actually to ensure the correctness of the backup tool Xtrabackup.

When binlog acts as a coordinator, if the transaction order recorded in it is different from the order recorded at the storage engine layer, there may be gaps in the locations of the backup set obtained by the backup tool (Innodb Hot Backup). Because the backup tool will copy the redo log, the binlog position corresponding to the last committed transaction will be recorded in the redo header. After the backup set is created, the binlog will continue to be dumped from the primary database based on this position.

Suppose there are three transactions T1, T2, and T3 that have been fsynced to the binlog file. The positions of the three transactions in the file are 100, 200, and 300 respectively. However, at the engine layer, only T1 and T3 have completed the commit and recorded in the redo log. The position of the last committed transaction T3 is 300. At this time, the data obtained by the backup tool is in this state. When the backup set is started, the crash recovery process will be followed, and the prepare transaction will be rolled back (the backup set will not back up the binlog file, corresponding to the empty xid set in the previous section). Since point 300, binlog synchronization from the primary database continues and apply, resulting in T2 being lost in the standby database.

Therefore, we must design a mechanism to ensure that the binlog writing order at the server layer is consistent with the transaction submission order at the storage engine layer.

Problem 3: Performance degradation caused by writing redo and binlog at the same time

Question 1 mentioned that each transaction submission will cause performance problems, and this problem will become more serious after the introduction of binlog. Each transaction submission will increase file IO once and require disk flushing. If the system concurrency is high, these IOs will become a bottleneck that slows down the overall performance.

Solution

Question 1: Redo log group submission technology

The idea of ​​redo group commit technology is very simple: by merging the flushing actions of multiple transaction redo logs, the number of flushing times can be reduced. In the Innodb log system, each redo log has an LSN (Log Sequence Number). When a transaction copies logs to the redo log buffer, it obtains the current maximum LSN, and the LSN increases monotonically, thus ensuring that the LSNs of different transactions will not be repeated. Then suppose that the maximum LSNs of the logs of the three transactions Trx1, Trx2, and Trx3 are LSN1, LSN2, and LSN3 respectively (LSN1 < LSN2 < LSN3), and they are committed at the same time. If trx3 is committed first, it will request to flush the disk to LSN3, which will also flush the redo logs of Trx1 and Trx2. Trx1 and Trx2 will determine that their own LSNs are smaller than the maximum LSN currently flushed to the disk, so there is no need to flush the disk again.

Problem 2: Internal XA Transactions

When binlog is enabled, internal XA transactions are introduced to coordinate the upper layer and the storage engine layer. Specifically, two stages are introduced when a transaction is committed:

Prepare: flushes the redo log to disk to ensure that updates to data pages and undo pages have been flushed to disk, and sets the transaction status to PREPARE;

commit: 1). Write binlog and flush to disk, 2). Call the engine layer transaction commit interface. Set the transaction status to COMMIT.

Such two-phase commit is mainly to ensure correctness when the database crashes. Because once the binlog is written to the disk, it may be consumed by downstream nodes. Such transactions must be committed rather than rolled back after restart. For transactions that have not been written to the disk in binlog, they are rolled back directly during crash recovery.

Specifically, during fault recovery, scan the last binlog file (in the flush phase, if the binlog size exceeds the threshold, rotate the binlog file to ensure that the last transaction recorded in the file is committed) and extract the xid from it. Redo the redo log after the checkpoint, read the undo segment information of the transaction, collect the transaction list in the prepare stage, compare the xid of the transaction with the xid recorded in the binlog, and commit if it exists, otherwise roll back.

Before MySQL 5.6, in order to ensure that the write order of the database binlog is consistent with the transaction commit order of the InnoDB layer, the MySQL database uses the prepare_commit_mutex lock internally.

Specifically, the lock is added during the two-phase commit engine layer prepare, and the lock is released after the engine layer commits:

innobase_xa_prepare()
write() and fsync() binary log
innobase_commit()

This can indeed ensure that the transaction order of binlog and innodb is consistent, but this lock will cause all transactions to be executed serially, and each submission will call at least multiple fsyncs, which is very inefficient. This is also an issue that needs to be explored and resolved next.

Question 4

Refer to the redo log optimization technology and introduce the group commit technology to optimize the binlog writing performance.

Consider the transaction submission process when it is not optimized:

Prepare: In this phase, the redo log of the storage engine layer (innodb) is flushed and the transaction status is set to PREPARED (update the transaction status on the undo page). Binlog is not involved in this phase.
Commit: Write binlog logs and flush to disk. At the same time, the engine layer releases locks, releases rollback segments, sets the transaction status to COMMITTED, and other so-called group commit technologies. In essence, they split the time-consuming commit steps into more fine-grained steps. Specifically:

Divide the commit of step 2 into three stages:

Flush: write binlog, but not sync
Sync: Call fsync operation to write the file to disk
Commit: Call the storage engine interface to commit the transaction

The fsync here is a time-consuming operation, so we hope to accumulate enough writes before making an fsync call, and use batch technology here. The principle is: each stage in the above steps has a corresponding task linked list, and each thread entering this stage will add its own task to the linked list, and the linked list is locked to ensure correctness. The first thread to join the linked list becomes the Leader, and subsequent threads become Followers. All tasks in the linked list form a Batch, which is executed by the Leader, while the Followers just wait for their tasks to be completed.

Once the linked list tasks of a certain stage are completed, these tasks will enter the next stage and be added to the task linked list of that stage, repeating the above execution flow.

This design has the following advantages:

  1. Using Leader execution instead of each thread can effectively reduce the number of write/fsync calls and improve efficiency
  2. It can ensure that the order of transaction writing binlog and engine layer submission is consistent
  3. Multiple transactions can be executed concurrently without being forced to be serialized by the prepare_commit_mutex lock.

In addition, MYSQL has further optimized the flushing of redo logs in the prepare phase. The original design allowed multiple transactions to flush redo logs concurrently, which was also inefficient. The redo log flushing in the prepare phase can be performed in the Flush phase of the commit phase. However, there is a small problem that needs to be explained: before the optimization, each thread is responsible for flushing its own redo log and knows the LSN of the redo log that needs to be flushed. If the leader thread flushes the redo logs to disk in the flush phase, it does not know the LSN of each thread's redo log, so it simply and roughly flushes to the maximum LSN of log_sys, which ensures that the redo logs of the transactions to be submitted can be flushed to disk.

Summarize

This is the end of this article about binlog optimization thinking in MYSQL. For more relevant MYSQL binlog optimization thinking content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • MySQL series: redo log, undo log and binlog detailed explanation
  • Specific use of MySQL binlog_ignore_db parameter
  • How to choose the format when using binlog in MySQL
  • Detailed explanation of the binlog log analysis tool for monitoring MySQL: Canal
  • In-depth explanation of binlog in MySQL 8.0
  • Detailed explanation of MySQL database binlog cleanup command
  • Steps to enable MySQL database monitoring binlog
  • How to distinguish MySQL's innodb_flush_log_at_trx_commit and sync_binlog

<<:  Win10 configuration tomcat environment variables tutorial diagram

>>:  Detailed implementation plan of Vue front-end exporting Excel files

Recommend

JavaScript to implement the function of changing avatar

This article shares the specific code of JavaScri...

A Brief Analysis of MySQL - MVCC

Version Chain In InnoDB engine tables, there are ...

What does this.parentNode.parentNode (parent node of parent node) mean?

The parent node of the parent node, for example, t...

Summary of practical methods for JS beginners to process arrays

join() method: connects all elements in an array ...

How to generate a free certificate using openssl

1: What is openssl? What is its function? What is...

An article to show you how to create and use Vue components

Table of contents 1. What is a component? 2. Crea...

Analysis of the process of simply deploying nginx in Docker container

1. Deploy nginx service in container The centos:7...

Detailed explanation of the relationship between Vue and VueComponent

The following case reviews the knowledge points o...

An example of the calculation function calc in CSS in website layout

calc is a function in CSS that is used to calcula...

Chinese website user experience rankings

<br />User experience is increasingly valued...

How to modify the forgotten password when installing MySQL on Mac

1. Install MySQL database on mac 1. Download MySQ...

A brief analysis of how to use border and display attributes in CSS

Introduction to border properties border property...