Detailed explanation of the persistence implementation principle of transactions in MySQL

Detailed explanation of the persistence implementation principle of transactions in MySQL

Preface

When it comes to database transactions, a lot of transaction-related knowledge will easily pop up in everyone's mind, such as the ACID characteristics of transactions, isolation levels, problems solved (dirty reads, non-repeatable reads, phantom reads), etc., but few people may really understand how these transaction characteristics are implemented and why there are four isolation levels.

In the previous article, we have learned about the implementation principle of transaction isolation in MySQL. Today, let’s continue to talk about the implementation principle of MySQL persistence.

Of course, MySQL is extensive and profound, and omissions in the article are inevitable. Criticisms and corrections are welcome.

illustrate

The transaction implementation logic of MySQL is located at the engine layer, and not all engines support transactions. The following instructions are based on the InnoDB engine.

InnoDB reading and writing data principle

Before we continue, we need to understand how InnoDB reads and writes data. We know that database data is stored on disks, and we also know that the cost of disk I/O is very high. If the disk must be accessed every time data is read or written, the efficiency of the database will be very low. To solve this problem, InnoDB provides Buffer Pool as a buffer for accessing database data.

The Buffer Pool is located in memory and contains a mapping of some data pages on disk. When data needs to be read, InnoDB will first try to read it from the Buffer Pool. If it cannot read the data, it will read it from the disk and put it into the Buffer Pool. When writing data, it will first write to the Buffer Pool page, mark such page as dirty, and put it on a special flush list. These modified data pages will be flushed to the disk at some point in the future (this process is called dirty flushing, which is the responsibility of other background threads). As shown in the following figure:

The advantage of this design is that it can convert a large amount of disk I/O into memory reading and writing, and merge multiple modifications to a page into one I/O operation (flushing the entire page once it is dirty), avoiding access to the disk for each read and write operation, thereby greatly improving the performance of the database.

Persistence Definition

Persistence means that once a transaction is committed, its changes to the database should be permanent, and subsequent operations or failures should not have any impact on the modifications of this transaction.

From the previous introduction, we know that InnoDB uses Buffer Pool to improve the performance of reading and writing. However, the Buffer Pool is in memory and is volatile. If MySQL suddenly crashes after a transaction is committed, and the modified data in the Buffer Pool has not been refreshed to the disk at this time, data will be lost and the persistence of the transaction cannot be guaranteed.

To solve this problem, InnoDB introduced redo log to achieve persistence of data modifications. When data is modified, InnoDB not only modifies the data in the Buffer Pool, but also records the operation in the redo log and ensures that the redo log is written to the disk earlier than the corresponding page (usually when the transaction is committed), which is often called WAL. If MySQL crashes suddenly and the data has not been flushed back to disk, after restarting, MySQL will use the redo log that has been written to disk to recover the data pages that have not been flushed to disk.

Implementation principle: redo log

To improve performance, similar to data pages, redo logs also consist of two parts: one is the log buffer in memory, which is volatile; the other is the redo log file on the disk, which is persistent. Redo log is a physical log that records the status of physical pages in the database.

When data is modified, InnoDB not only modifies the data in the Buffer Pool, but also records the operation in the redo log buffer. When the transaction is committed, the redo log buffer is flushed to disk and recorded in the redo log file. If MySQL crashes, you can read the data in the redo log file when it restarts to restore the database. This way, there is no need to flush the data in real time every time a transaction is submitted.

Writing process

Note:

  • Modify the Buffer Pool first, then write the redo log buffer.
  • The redo log is written back to disk before the data page: When the transaction is committed, the redo log buffer is written to the redo log file. The commit is successful only if the write is successful (there are other scenarios that trigger the write, which will not be expanded here), and the data in the Buffer Pool is written to disk by the background thread at some subsequent time.
  • When flushing dirty data, we must ensure that the corresponding redo log has been written to the disk, which is the so-called WAL (write-ahead log), otherwise there is a possibility of data loss.

benefit

When a transaction is committed, writing to the redo log has three main advantages over directly flushing the log:

Flushing is random I/O, but writing redo logs is sequential I/O. Sequential I/O is much faster than random I/O and is not necessary.
The dirty flushing is done in units of data pages. Even if a page is only slightly modified, the entire page must be written. The redo log only contains the actually modified parts, and the amount of data is very small, which greatly reduces invalid IO.
When flushing data, many pages of data may need to be flushed, and atomicity cannot be guaranteed (for example, failure occurs when only part of the data is written). However, the redo log buffer writes log blocks to the redo log file in 512 bytes, which is the size of a sector. The sector is the smallest unit of writing, so it can be guaranteed that the writing is successful.

Should I write redo log first or modify data first?

A DML operation may involve data modification and redo log recording. What is the order in which they are executed? Some articles on the Internet say that data should be modified first and redo log should be recorded later, while others say that redo log should be recorded first and data should be modified later. So what is the real situation?

First of all, from the above description, we know that the redo log buffer will be written to the redo log file when the transaction is committed, and the flushing is at some subsequent time, so it is certain that the redo log is recorded first and the data page is modified later (of course, the WAL log is written first).

The next question is whether to write the redo log buffer first or modify the Buffer Pool first. To understand this problem, we first need to understand the execution process of a DML in InnoDB. The execution process of a DML involves data modification, locking, unlocking, redo log recording, and undo log recording, which also need to ensure atomicity. InnoDB uses MTR (Mini-transactions) to ensure the atomicity of a DML operation.

First, let’s look at the definition of MTR:

An internal phase of InnoDB processing, when making changes at the physical level to internal data structures during DML operations. A Mini-transactions (mtr) has no notion of rollback; multiple Mini-transactionss can occur within a single transaction. Mini-transactionss write information to the redo log that is used during crash recovery. A Mini-transactionss can also happen outside the context of a regular transaction, for example during purge processing by background threads. See https://dev.mysql.com/doc/refman/8.0/en/glossary.html

MTR is a short atomic operation that cannot be rolled back because it is atomic itself. Changes to data pages must go through MTR, which records changes to data pages caused by DML operations in the redo log.

Let’s take a quick look at the MTR process:

  • When MTR is initialized, a copy of mtr_buf will be initialized
  • When data is modified, while modifying the pages in the memory Buffer Pool, a redo log record is generated and saved in mtr_buf.
  • When the mtr_commit function is executed to commit the MTR, the redo log record in mtr_buf will be updated to the redo log buffer, and the dirty page will be added to the flush list for subsequent flushing. In the log buffer, every time a 496-byte log record is received, this group of log records is wrapped with a 12-byte block header and a 4-byte block tailer to become a 512-byte log block, which is convenient for aligning 512 bytes when flushing to disk.

It can be seen from this that InnoDB modifies the Buffer Pool first and then writes the redo log buffer.

The process of recovering data

In any case, InnoDB will attempt to perform recovery operations when it starts. During the recovery process, redo log is required, and if binlog is enabled, binlog and undo log are also required. Because it is possible that the data has been written to the binlog, but the database crashed before the redo log was flushed to disk (transactions are a feature of the InnoDB engine, and modified data may not be committed, while binlog is a feature of the MySQL service layer, and modified data will be recorded). At this time, the redo log, binlog, and undo log are required to determine whether there are any uncommitted transactions, and to roll back or commit the uncommitted transactions.

The following is a brief description of the process of restoring data using only redo logs:

  • When starting InnoDB, find the location of the most recent Checkpoint and use the Checkpoint LSN to find a redo log greater than the LSN for log recovery.
  • If the recovery fails in the middle, it will not have any impact. When you recover again, you can continue to recover from the position of the last successfully saved Checkpoint.

Recover process: Fault recovery consists of three stages: Analysis, Redo, and Undo. The task of the Analysis phase is to use the information in the Checkpoint and Log to confirm the operation scope of the subsequent Redo and Undo phases, correct the Dirty Page set information recorded in the Checkpoint through the Log, and use the smallest LSN position involved as the starting position RedoLSN of the next Redo. At the same time, the active transaction set (uncommitted transactions) recorded in the Checkpoint is corrected as the rollback object of the Undo process; the Redo stage starts from the RedoLSN obtained by Analysis and replays all the Redo contents in the Log. Note that this also includes uncommitted transactions; finally, the Undo stage rolls back all uncommitted transactions using the Undo information. All modifications that need to be rolled back can be found sequentially through the PrevLSN of the Log. For details, see http://catkang.github.io/2019/01/16/crash-recovery.html

What is LSN?

LSN, also known as log sequence number, is a monotonically increasing 64-bit unsigned integer. Both redo log and data pages store LSN, which can be used as a basis for data recovery. A larger LSN indicates that the change described by the referenced log record occurred later.

What is Checkpoint?

Checkpoint represents a save point, and all changes to the data page before this point (log LSN < Checkpoint LSN) have been written to the disk file. InnoDB records a Checkpoint after each disk flush and records the latest redo log LSN in the Checkpoint LSN, which is convenient for determining the starting point when restoring data.

The above is a detailed explanation of the persistence implementation principle of transactions in MySQL. For more information about the persistence of MySQL transactions, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of the implementation principle of transaction isolation level in MySQL
  • Detailed explanation of the syntax and process of executing MySQL transactions
  • Description of the default transaction isolation level of mysql and oracle
  • MySQL transaction autocommit automatic commit operation
  • MySQL common statements for viewing transactions and locks
  • MySQL master-slave synchronization, implementation principle of transaction rollback
  • In-depth analysis of MySQL database transactions and locks
  • Will Update in a Mysql transaction lock the table?
  • In-depth understanding of PHP+MySQL distributed transactions and solutions
  • How does MySQL implement ACID transactions?
  • Why MySQL should avoid large transactions and how to solve them

<<:  Full analysis of web page elements

>>:  Vue realizes screen adaptation of large screen pages

Recommend

How to encapsulate the table component of Vue Element

When encapsulating Vue components, I will still u...

HTML head tag meta to achieve refresh redirection

Copy code The code is as follows: <html> &l...

Two simple menu navigation bar examples

Menu bar example 1: Copy code The code is as foll...

Detailed explanation of Vue's live broadcast function

Recently, the company happened to be doing live b...

MySql common query command operation list

MYSQL commonly used query commands: mysql> sel...

The final solution to Chrome's minimum font size limit of 12px

I believe that many users who make websites will ...

5 ways to make your JavaScript codebase cleaner

Table of contents 1. Use default parameters inste...

Build a Scala environment under Linux and write a simple Scala program

It is very simple to install Scala environment in...

Nginx access control and parameter tuning methods

Nginx global variables There are many global vari...

Detailed explanation of application scenarios of filters in Vue

filter is generally used to filter certain values...

Install centos7 virtual machine on win10

1. Download VMware Workstation 64 version https:/...