Detailed explanation of redo log and undo log in MySQL

Detailed explanation of redo log and undo log in MySQL

The most important logs in the MySQL log system are the redo log and the archive log, the latter is the log of the MySQL Server layer, and the former is the log of the InnoDB storage engine layer.

1 redo log

1.1 What is redo log

The redo log is used to ensure the durability of transactions, which is the D in ACID.

Persistence: Once a transaction is committed, the changes it makes to the data in the database are permanent, and should not be affected by a subsequent database failure.

There are two types of redo logs: physical redo logs and logical redo logs. In InnoDB, the redo log is a physical log in most cases, recording the physical changes of data pages (actual data values).

1.2 Functions of redo log

The main function of redo log is to recover data when the database crashes.

1.3 Composition of redo log

The redo log can be divided into the following two parts

The redo log buffer stored in memory and the redo log files stored on disk

1.4 When to record redo logs

After the data modification is completed, the dirty page is written to the redo log buffer before being flushed to disk. That is, modify first, then write.

Dirty pages: data in memory that is inconsistent with the data on disk (not bad!)

In the following cases, redo logs are written from the redo log buffer to redo log files on disk.

  • When the redo log buffer's logs occupy half of the total capacity of the redo log buffer, the redo logs are written to disk.
  • When a transaction commits, its redo log is flushed to disk, which ensures that data is never lost (the most common case). Note that at this point all dirty pages in memory may not have been written to disk.
  • The background thread refreshes periodically, and there is a background thread that writes the redo log to disk every second.
  • When MySQL is shut down, redo logs are written to disk.

The first and fourth cases will definitely execute the writing of redo logs. The execution of the second and third cases depends on the setting value of the parameter innodb_flush_log_at_trx_commit , which will be described in detail below.

The creation of an index also requires recording the redo log.

1.5 An example of redoing the entire process

Take update transactions as an example.

  • Read the original data into memory and modify the in-memory copy of the data.
  • Generate a redo log and write it to the redo log buffer. The redo log stores the new value after the modification.
  • When a transaction is committed, the contents of the redo log buffer are flushed to the redo log file.
  • The dirty pages in memory are then flushed back to disk normally.

1.6 Guarantee of persistence

1.6.1 Force Log at Commit Mechanism

The Force Log at Commit mechanism implements transaction persistence. When operating in memory, logs are written to the redo log buffer. But before a transaction can be committed, all logs must first be written to the redo log files on disk.

To ensure that each log is written to the redo log file, an fsync system call must be used to ensure that the log in the OS buffer is completely written to the log file on disk.

fsync system call: You need to pass it an fd as the input parameter, and then the system call will work on the file pointed to by this fd. fsync will ensure that it will not return until the disk write operation is completed, so when your program uses this function and it returns successfully, it means that the data must have been safely written to the disk. Therefore, fsync is suitable for programs such as databases.

1.6.2 innodb_flush_log_at_trx_commit Parameter

InnoDB provides a parameter innodb_flush_log_at_trx_commit to control the strategy of flushing logs to disk.

  • When innodb_flush_log_at_trx_commit value is 1 (default). Each time a transaction is committed, the log in the log buffer must be written to the OS buffer and fsync() must be called to write it to the disk.

This method will not lose any data even if the system crashes, but because each submission is written to disk, the IO performance is poor.

  • When innodb_flush_log_at_trx_commit value is 0. When a transaction is committed, the log buffer is not written to the OS buffer. Instead, it is written to the OS buffer every second and fsync() is called to write it to the log file on disk.

This is actually equivalent to maintaining a user-designed buffer in memory, which reduces data transmission between the OS buffer and has better performance.

If the system crashes, 1 second of data will be lost if the disk is written every second.

  • When innodb_flush_log_at_trx_commit value is 2. Each commit is written only to the OS buffer, and then fsync() is called every second to write the logs in the OS buffer to the log file on disk.

Although we call fsync() every second to write the logs in the OS buffer to the log file on disk, even if fsync is not called at other times, the data will gradually enter the disk autonomously. So when a system crash occurs, less data will be lost compared to the second case.

But at the same time, since each submission is written to the OS buffer, the performance will be worse than the second case, but still better than the first case.

In either case

1.6.3 A small performance test

The performance gap between the options is huge. Let's do a simple test.

#Create a test table drop table if exists test_flush_log;
create table test_flush_log(id int,name char(50))engine=innodb;

#Create a stored procedure to insert a specified number of rows into the test table drop procedure if exists proc;
delimiter $$
create procedure proc(i int)
begin
    declare s int default 1;
    declare c char(50) default repeat('a',50);
    while s<=i do
        start transaction;
        insert into test_flush_log values(null,c);
        commit;
        set s=s+1;
    end while;
end$$
delimiter ;

One hundred thousand records are inserted below.

Ⅰ When innodb_flush_log_at_trx_commit value is 1

test> call proc(100000)
[2021-07-25 13:22:02] completed in 27 s 350 ms

It takes up to 27.35s.

Ⅱ When innodb_flush_log_at_trx_commit value is 2

test> set @@global.innodb_flush_log_at_trx_commit=2;    
test> truncate test_flush_log;

test> call proc(100000)
[2021-07-25 13:27:33] completed in 5 s 774 ms

It only takes 5.774s, which greatly improves the performance.

III When innodb_flush_log_at_trx_commit value is 0

test> set @@global.innodb_flush_log_at_trx_commit=0;
test> truncate test_flush_log;

test> call proc(100000)
[2021-07-25 13:30:34] completed in 3 s 537 ms

It only takes 3.537s, with higher performance.

Obviously, the performance is very poor when innodb_flush_log_at_trx_commit value is 1. After changing it to 0 and 2, the performance is greatly improved. 0 is faster but not much better than 2.

Although changing to 0 and 2 can greatly improve performance, it will seriously affect security. We can modify the stored procedure, put the creation and submission of transactions outside the loop, submit them in a unified manner, and reduce the IO frequency.

drop procedure if exists proc;
delimiter $$
create procedure proc(i int)
begin
    declare s int default 1;
    declare c char(50) default repeat('a',50);
    start transaction;
    while s<=i DO
        insert into test_flush_log values(null,c);
        set s=s+1;
    end while;
    commit;
end$$
delimiter ;

1.6.4 Mini-transaction

Mini-transaction is a mechanism used by InnoDB to process small transactions. It can ensure the data consistency in the data page when concurrent transaction operations and database exceptions occur.

Mini transactions must follow the following three protocols:

  • FIX rules. An exclusive lock must be used when writing, and a shared lock must be used when reading. Anyway, it has to be locked.
  • Write-ahead log. Write-Ahead Log (WAL) Before persisting data, you must first persist the log in memory. Each page has an LSN (Log Sequence Number). Before writing data to disk, logs in memory with sequence numbers less than LSN must be written to disk first. WAL provides three persistence modes

The most stringent is full-sync, fsync ensures that the record is flushed to disk before returning, maximizing data security.

The second level is write-only, which ensures that the records are written to the operating system. This allows data to survive process-level crashes.

The least strict is no-sync, which keeps records in a memory buffer and does not guarantee immediate writing to the file system.

Force log resubmission. That is, Force-log-at-commit, which requires that all mini-transaction logs must be flushed to disk when a transaction is committed.

1.7 The process of writing redo log

The figure above shows how the redo log is written to the log buffer. Each mini-transaction corresponds to each DML operation, such as an update statement.

  • Each data modification is written to the mini-transaction private buffer.
  • When the update statement completes, the redo log is written from the mini-transaction private buffer to the public log buffer in memory.
  • When an external transaction is committed, the redo log buffer is flushed to the redo log files.

1.8 Log Block

The redo log is stored in blocks, and each block is 512 bytes in size. Whether in the memory redo log buffer, operating system buffer or redo log file, it is stored in such 512-byte blocks.

Each log block header consists of the following four parts

  • log_block_hdr_no: (4 bytes) The location ID of the log block in the redo log buffer.
  • log_block_hdr_data_len: (2 bytes) The size of the log recorded in this log block. When the log block is full, it is 0x200, which means 512 bytes.
  • log_block_first_rec_group: (2 bytes) The starting offset position of the first log in the log block.
  • lock_block_checkpoint_no: (4 bytes) The location where the checkpoint information is written.

1.9 log group

Log group represents the grouping of redo logs, which consists of multiple redo log files of the same size. Determined by a parameter innodb_log_files_group , the default is 2.
[External link image transfer failed, the source site may have an anti-theft mechanism. It is recommended to save the image and upload it directly (img-h01w68EG-1627284031849)(G:\markdown\MySQL\image-20210726131134489.png)].png)]

This group is a logical concept, but the group directory can be defined by the variable innodb_log_group_home_dir . The redo log files are placed in this directory, which is under datadir by default.

2 Undo log

2.1 About undo log

The purpose of undo log is to ensure the atomicity of database transactions.

Atomicity means that a transaction is an indivisible unit of work, and either all operations in a transaction occur or none of them occur.

  • EDO log records the behavior of transactions, which can well ensure consistency and "redo" the data. But transactions sometimes need to be rolled back, and in this case, undo log is needed. When we make changes to records, we need to generate an undo log, which records the old version of the data. When the old transaction needs to read the data, it can follow the undo chain to find the record that meets its visibility.
  • Undo logs usually exist in the form of logical logs. We can assume that when a record is deleted, the undo log will generate a corresponding insert record, and vice versa. When a record is updated, a counter-update record is generated.
  • Undo logs are recorded in segments. Each undo operation occupies one undo log segment when recorded.
  • Undo log will also generate redo log, because undo log also needs to achieve persistent protection.

Undo logs usually exist in the form of logical logs. We can assume that when a record is deleted, the undo log will generate a corresponding insert record, and vice versa. When a record is updated, a counter-update record is generated.

Undo logs are recorded in segments. Each undo operation occupies one undo log segment when recorded.

Undo log will also generate redo log, because undo log also needs to achieve persistent protection.

2.2 undo log segment

In order to ensure that transactions do not conflict when writing their respective undo logs during concurrent operations, nnodb manages the undo logs in segments. The rollback segment is called a rollback segment, and each rollback segment has 1024 undo log segments. MySQL versions after 5.5 support 128 rollback segments, which can store 128*1024 operations. You can also define a rollback segment by using the innodb_undo_logs parameter.

2.3 purge

MySQL is designed like this in the operation of clustered index columns. For a delete statement

delete from t where a = 1

If a has a clustered index (primary key), then no actual deletion will be performed. Instead, the delete flag will be set to 1 for the record where the primary key column is equal to 1, that is, the record will be saved in the B+ tree. Similarly, for update operations, instead of directly updating the record, the old record is marked as deleted and a new record is created.

So, when are the old version records actually deleted?

InnoDB uses undo logs to delete old versions. This operation is called purge operation. InnoDB opens up a purge thread to perform purge operations, and can control the number of purge threads. Each purge thread performs a purge operation every 10 seconds.

InnoDB undo log design

The undo logs of multiple transactions are allowed to exist on one page, and the storage order of the undo logs is arbitrary. InnoDB maintains a history linked list, connecting the undo logs in the order in which the transactions were committed.

During the purge process, the InnoDB storage engine first finds the first record that needs to be cleaned from the history list, which is trx1 here. After the cleanup, the InnoDB storage engine will continue to look for records that can be cleaned in the Undo page where trx1 is located. Here, transaction trx3 will be found, and then trx5 will be found, but it is found that trx5 is referenced by other transactions and cannot be cleaned. Therefore, it will search in the history list again and find that the last record is trx2. Then, the Undo page where trx2 is located is found, and trx6 and trx4 are cleaned up in turn. Since all records in Undo page2 have been cleaned up, the Undo page can be reused.

The design mode of the InnoDB storage engine, which first searches for undo logs in the history list and then searches for undo logs in the Undo page, is to avoid a large number of random read operations, thereby improving the efficiency of purge.

3 InnoDB Recovery Operations

3.1 Data page flushing rules and checkpoints

Data in the memory (buffer pool) that has not been flushed to disk is called dirty data. Since both data and logs exist in the form of pages, dirty pages represent dirty data and dirty logs.

In InnoDB, checkpoint is the only rule for flushing data to disk. After the checkpoint is triggered, the dirty data in the memory will be flushed to the disk.

There are two types of checkpoints in the innodb storage engine:

  • Sharp checkpoint: When reusing a redo log file (for example, when switching log files), all dirty data recorded in the redo log is flushed to disk.
  • Fuzzy checkpoint: Only a small part of the log is flushed to disk at a time, rather than flushing all dirty logs to disk. The following situations will trigger this checkpoint:

master thread checkpoint. Controlled by the master thread, a certain percentage of dirty pages are flushed to disk every second or every 10 seconds.
flush_lru_list checkpoint. Starting from MySQL 5.6, the innodb_page_cleaners variable can be used to specify the number of page cleaner threads responsible for flushing dirty pages to disk. The purpose of this thread is to ensure that there are available free pages in the lru list.
async/sync flush checkpoint. Synchronous or asynchronous disk flushing. For example, if there are still a lot of dirty pages that have not been flushed to disk (how many is too many is controlled by a ratio), then we will choose to flush them to disk synchronously, but this rarely happens; if there are not many dirty pages, we can choose to flush them to disk asynchronously; if there are few dirty pages, we can temporarily not flush the dirty pages to disk
Dirty page too much checkpoint. A checkpoint is triggered when there are too many dirty pages to ensure that there is enough free space in the cache. The too much ratio is controlled by the variable innodb_max_dirty_pages_pct. The default value of MySQL 5.6 is 75, which means that when dirty pages occupy 75% of the buffer pool, some dirty pages are forced to be flushed to disk.

Since flushing dirty pages takes a certain amount of time to complete, the location of the record checkpoint is marked in the redo log after each flush is completed.

3.2 LSN

3.2.1 LSN Concept

LSN is called the logical sequence number of the log and occupies 8 bytes in InnoDB.

We can learn the following information through LSN:

  • Version information of the data page.
  • The total amount of logs written.
  • The location of the checkpoint.

LSNs exist in the following two locations:

  • In the redo log records.
  • There is a variable fil_page_lsn in the header of each data page to record the final LSN value of this page.

Obviously, if the LSN value in the page is smaller than the LSN value in the redo log, it means that data is lost.

You can use show engine innodb status to view the current InnoDB operation information, in which there is a column log containing records about lsn.

  • The log sequence number records the LSN in the current redo log (in buffer).
  • Log flushed up to is the LSN in the redo log file flushed to disk.
  • Pages flushed up to is the LSN that has been flushed to the disk data page.
  • last checkpoint at is the LSN of the last checkpoint location.

3.2.2 LSN Processing Flow

(1) First, modify the data page in memory and record the LSN in the data page, which we will call data_in_buffer_lsn.

(2) When the data page is modified (almost simultaneously), the redo log is written to the redo log in buffer and the corresponding LSN is recorded, which is temporarily called redo_log_in_buffer_lsn.

(3) After writing the log in the buffer, when several log flushing rules are triggered, the redo log will be flushed to the redo log file on disk, and the corresponding LSN will be recorded in the file, which is temporarily called redo_log_on_disk_lsn;

(4) Data pages cannot stay in memory forever. In some cases, a checkpoint will be triggered to flush dirty pages in memory (data dirty pages and log dirty pages) to disk. Therefore, when the checkpoint dirty page flush is completed, the LSN position of the checkpoint will be recorded in the redo log, which is temporarily called checkpoint_lsn.

(5) To record the checkpoint location quickly, you only need to set a flag. However, flushing data pages may not be fast. For example, there are many data pages to be flushed in this checkpoint. That is to say, it takes a certain amount of time to flush all the data pages. Each data page flushed in the middle will record the LSN of the current page, which is temporarily called data_page_on_disk_lsn.

In the above figure, the horizontal lines from top to bottom represent: the timeline, the LSN recorded in the data page in the buffer (data_in_buffer_lsn), the LSN recorded in the data page on the disk (data_page_on_disk_lsn), the LSN recorded in the redo log in the buffer (redo_log_in_buffer_lsn), the LSN recorded in the redo log file on the disk (redo_log_on_disk_lsn), and the LSN recorded in the checkpoint (checkpoint_lsn).

Assume that at the beginning (12:0:00) all log pages and data pages have been flushed to disk and the checkpoint LSN has been recorded. At this time, their LSNs are completely consistent.

Assume that a transaction is started and an update operation is executed immediately. After the execution is completed, the data page in the buffer and the redo log both record the updated LSN value, which is assumed to be 110. At this time, if you execute show engine innodb status to view the values ​​of each LSN, that is, the position status at ① in the figure, the result will be:

log sequence number(110) > log flushed up to(100) = pages flushed up to = last checkpoint at

After that, another delete statement is executed and the LSN increases to 150. At 12:00:01, the redo log flushing rule is triggered (one of the rules is that the default log flushing frequency controlled by innodb_flush_log_at_timeout is 1 second). At this time, the LSN in the redo log file on disk will be updated to the same as the LSN in the redo log in buffer, so both are equal to 150. At this time, show engine innodb status, which is the position ② in the figure, the result will be:

log sequence number(150) = log flushed up to > pages flushed up to(100) = last checkpoint at

After that, an update statement is executed, and the LSN in the cache will increase to 300, which is position ③ in the figure.

Assume that a checkpoint occurs later, which is the position ④ in the figure. As mentioned earlier, the checkpoint will trigger the flushing of data pages and log pages to disk, but it takes a certain amount of time to complete. Therefore, before the flushing of data pages is completed, the LSN of the checkpoint is still the LSN of the previous checkpoint, but at this time the LSN of the data pages and log pages on the disk has increased, that is:

log sequence number > log flushed up to and pages flushed up to > last checkpoint at

However, the size of log flushed up to and pages flushed up to cannot be determined because log flushing may be faster than data flushing, may be equal to, or may be slower than. However, the checkpoint mechanism protects the data disk from being flushed slower than the log disk: when the data disk flushing speed exceeds the log disk flushing speed, the data disk flushing will be temporarily stopped until the log disk flushing progress exceeds the data disk flushing progress.

When the data pages and log pages are flushed to disk, that is, when they reach position ⑤, all LSNs are equal to 300.

As time goes by, at 12:00:02, which is position ⑥ in the figure, the log flushing rule is triggered again. However, the log LSN in the buffer and the log LSN in the disk are consistent at this time, so the log flushing is not performed. That is, when show engine innodb status is executed at this time, all LSNs are equal.

Then an insert statement is executed. Assume that the LSN in the buffer increases to 800, which is position ⑦ in the figure. At this time, the sizes of various LSNs are the same as in position ①.

Then the commit action is performed, i.e. position ⑧. By default, commit actions trigger log flushing but not data flushing, so the result of show engine innodb status is:

log sequence number = log flushed up to > pages flushed up to = last checkpoint at

Finally, as time goes by, the checkpoint appears again, which is position ⑨ in the figure. However, this checkpoint will not trigger a log flush because the log's LSN has been synchronized before the checkpoint occurs. Assume that the data is flushed to disk extremely quickly, so fast that it is completed in an instant and the status change cannot be captured. In this case, the result of show engine innodb status will be that all LSNs are equal.

3.3 InnoDB Recovery Behavior

When you start InnoDB, a recovery operation will be performed regardless of the reason for the last exit.

A checkpoint indicates that the LSN on the data page on the disk has been completely flushed, so during recovery, only the log portion starting from the checkpoint needs to be restored. For example, when the database crashes when the LSN of the last checkpoint is 10000, and the transaction is in the committed state. When the database is started, the LSN of the data page on the disk is checked. If the LSN of the data page is less than the LSN in the log, recovery will start from the checkpoint.

There is another situation where the checkpoint is being flushed to disk before the crash, and the progress of flushing the data page exceeds the progress of flushing the log page. At this time, if the computer crashes, the LSN recorded in the data page will be greater than the LSN in the log page. This situation will be checked during the recovery process of the restart. At this time, the part that exceeds the log progress will not be redone, because this itself means that what has been done does not need to be redone.

In addition, the transaction log is idempotent, so multiple operations that produce the same result are only recorded once in the log. However, the binary log is not idempotent. Multiple operations will be recorded. During recovery, the records in the binary log will be executed multiple times, which is much slower. For example, if the initial value of id in a certain record is 2, and the value is set to 3 through update, and then set to 2 again, the transaction log will record the unchanged page and no recovery is required. However, the binary will record the two update operations, and these two update operations will also be executed during recovery, which is slower than transaction log recovery.

This is the end of this article about redo log and undo log in MySQL. For more information about redo log and undo log in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Understand the difference between redo log and binlog in MySQL in one article
  • The difference between redo log and binlog in MySQL
  • A brief analysis of the differences between undo, redo and binlog in MySQL
  • Analysis of MySQL crash recovery based on Redo Log and Undo Log
  • Summary of MySQL Undo Log and Redo Log
  • In-depth analysis of MySQL 8.0 redo log
  • MySQL series: redo log, undo log and binlog detailed explanation
  • Detailed explanation of MySQL redo log (redo log) and rollback log (undo logo)
  • In-depth understanding of MySQL redo log redo log

<<:  Introduction to useRef and useState in JavaScript

>>:  How to create a responsive column chart using CSS Grid layout

Recommend

Color matching techniques and effect display for beauty and styling websites

Color is one of the most important elements for a...

Centos7 startup process and Nginx startup configuration in Systemd

Centos7 startup process: 1.post(Power-On-Self-Tes...

Detailed explanation of pure SQL statement method based on JPQL

JPQL stands for Java Persistence Query Language. ...

Understanding of web design layout

<br />A contradiction arises. In small works...

How to connect to a remote server and transfer files via a jump server in Linux

Recently, I encountered many problems when deploy...

The difference between KEY, PRIMARY KEY, UNIQUE KEY, and INDEX in MySQL

The problem raised in the title can be broken dow...

CentOS 6.5 installation mysql5.7 tutorial

1. New Features MySQL 5.7 is an exciting mileston...

Object.entries usage you don't know in JavaScript

Table of contents Preface 1. Use for...of to iter...

Three ways to jump to a page by clicking a button tag in HTML

Method 1: Using the onclick event <input type=...

MySQL view principles and basic operation examples

This article uses examples to illustrate the prin...

Development details of Vue3 components

Table of contents 1. Introduction 2. Component De...

Beginner's guide to building a website ⑥: Detailed usage of FlashFXP

Today I will introduce the most basic functions of...

MySQL sorting principles and case analysis

Preface Sorting is a basic function in databases,...