How does MySQL implement ACID transactions?

How does MySQL implement ACID transactions?

Preface

Recently, during an interview, I was asked how MySQL's InnoDB engine implements transactions, or how it implements the ACID features. I didn't give a good answer at the time, so I summarized it myself and recorded it down.

The four characteristics of ACID transactions

The four major characteristics of transactions are ACID, A-Atomicity, C-Consistency, I-Isolation, and D-Durability. Consistency is the ultimate goal, and atomicity, isolation, and persistence are measures taken to ensure consistency. So the order I wrote is not based on ACID. I put consistency at the end, and the order becomes ADIC.

Atomicity (A)

Atomicity means that a transaction is an indivisible unit of work. Either all of it is executed successfully or all of it fails, with no intermediate state or only a portion of it being executed.

MySQL's InnoDB engine is implemented through undo log (rollback log), which can ensure that all successfully executed SQL statements are undone when a transaction is rolled back.

Undo log is a logical log that records information related to SQL execution. When a transaction modifies the database, InnoDB generates a corresponding undo log. If the transaction fails to execute or rollback is called, causing the transaction to be rolled back, the InnoDB engine will roll back the data to its previous state based on the records in the undo log.
For example, when an insert statement is executed, an undo log of the related delete statement will be generated. Conversely, executing a delete statement will also generate an undo log for the related insert statement. The same is true when executing an update statement, but the update statement may involve MVCC when executing the undo log rollback. This is mainly to ensure which version of the data can be seen when selecting when executing undo log.

Persistence (D)

Persistence means that once a transaction is committed, the operation on the database is permanent, and subsequent other operations and abnormal failures should not have any impact on it.
We all know that MySQL data is ultimately stored on disk, so the size of the disk determines the size of the data capacity. However, if all MySQL operations are performed by reading and writing disks, the disk I/O alone is enough to greatly reduce efficiency.

Therefore, InnoDB provides a buffer pool for MySQL, which contains the mapping of some data pages on the disk.
When reading data from the database, the data is first read from the Buffer Pool. If the data is not in the Buffer Pool, it is read from the disk and placed in the Buffer Pool.
When writing data to the database, it is first written to the Buffer Pool. The updated data in the Buffer Pool is periodically refreshed to the disk (this process is called flushing).

Although the Buffer Pool improves the efficiency of MySQL reading and writing, it also brings new problems. That is, if MySQL suddenly crashes when the data is just updated to the Buffer Pool and has not yet been refreshed to the disk, this will cause data loss and the durability of the transaction cannot be guaranteed.
In order to solve this cache consistency problem, redo log appeared. When modifying data in the Buffer Pool, the operation is recorded through the redo log. When the transaction is committed, the redo log is flushed to disk through the fsync interface.

Because the redo log is synchronized to the disk when the transaction is committed, when MySQL crashes, the redo log can be read from the disk to recover the data, thus ensuring the persistence of the transaction.

The redo log uses a pre-write method to record logs, that is, the log is recorded first, and then the Buffer Pool is updated. This ensures that as long as the data is saved in the redo log, it will be stored on the disk.

This requires some explanation. Redo log is also written to disk, and flushing is also written to disk. Why do we need to record redo log first instead of flushing directly?

The main reason is that redo log is much faster than flushing.

The first point is that redo log is an append operation log, which is a sequential IO; while dirty flushing is a random IO, because the data updated each time is not necessarily adjacent, that is, random.

The second point is that dirty flushing is done in units of data pages (that is, at least one page of data is read from the disk to the memory each time, or at least one page of data is flushed to the disk). The default page size of MySQL is 16KB. For any modification on a page, the entire page must be flushed to the disk; and the redo log only contains the operation logs that actually need to be written to the disk.

MySQL also has a log that records operations, called binlog. So what is the difference between redo log and binlog?

  • The first difference in function:

The redo log is used to record the update cache to ensure that the persistence of transactions will not be affected even if MySQL crashes; the binlog is used to record what operations were performed and when, mainly with time points, to ensure that data can be restored to a certain point in time, and is also used for master-slave synchronization of data.

  • The second difference is at the level:

The redo log is implemented by the storage engine InnoDB (MyISAM does not have a redo log), while the binlog is also available in any other storage engine at the MySQL server level.
In terms of storage content, redo log is a physical log based on disk data pages, and binlog is a logical log that stores an executed SQL statement.

  • The third point is the difference in writing timing:

By default, redo log is flushed to disk when the transaction is committed. The policy can be changed through the parameter innodb_flush_log_at_trx_commit so that the redo log does not need to wait until the transaction is committed before flushing to disk.
For example: it can be set to submit once per second.
The binlog is written when a transaction is committed.

Isolation (I)

Atomicity and durability are both measures based on a single transaction, while isolation is a property that multiple transactions are isolated from each other and do not affect each other.
We all know that the most rigorous isolation level of transactions is Serializable, but the higher the isolation, the lower the performance, so the serializable isolation level is generally not used.
For isolation, we will discuss two situations:

  • The impact of write operations in one transaction on write operations in another transaction;
  • The impact of a write operation in one transaction on a read operation in another transaction;

First of all, the write operations between transactions are actually isolated by the MySQL lock mechanism, while the write and read operations between transactions are implemented by the MVCC mechanism.

Lock mechanism

The locks in MySQL are mainly

According to the function: read lock and write lock; according to the scope of action: table-level lock and row-level lock;
There are also intention locks, gap locks, etc.

Read lock: also known as "shared lock", means that multiple transactions can share a lock and can only access data but cannot modify it.

Write lock: also known as "exclusive lock", cannot share data with other transactions. If a transaction obtains an exclusive lock for a piece of data, other transactions cannot obtain other locks on the row, including shared locks and exclusive locks.

Table-level lock: means locking the entire table, which has poor performance. Different storage engines support different lock granularities. The MyISAM engine supports table-level locks, and the InnoDB engine supports both table-level locks and row-level locks.

Row-level lock: The corresponding rows that need to be operated will be locked, with good performance.

Intention lock: Intention lock is a table-level lock. If a transaction has already added an exclusive lock or a shared lock to a certain data in a table, then an intention lock can be added. In this way, when the next transaction comes to lock the table and finds that the intention lock already exists, it will be blocked first. If the intention lock is not added, when the second transaction comes to lock the table, it needs to traverse one row at a time to check whether there is any data that has been locked.

Gap lock: Gap lock is a lock added to prevent phantom reads. It is added to non-existent free space, which can be between two index records, or the space before the first index record or after the last index record (but does not include the current record). This ensures that when the gap lock is executed, the newly added data will be blocked, ensuring that the number of records obtained by two queries in a transaction is consistent.

Next-Key Lock: Next-Key Lock is a combination of row-level lock and gap lock, because gap lock does not lock the current record, while Next-Key Lock will also lock the current record.

For example, if there are three records in a table:

id name number
1 Xiao Ming 16
2 Little Red 17
3 Xiao Zhang 20
4 Xiao Wang 20

Then when executing SQL: select * from table where number = 17 for update, the gap lock will be locked. The range of number is (16, 17), (17, 20), but the Next-Key Lock locks:
The intervals 16, 17, (17, 20) are locked with gaps, and number=17 is locked with a record.

The locking mechanism ensures the isolation of write operations between multiple transactions, while the guarantee of read and write operations between multiple transactions needs to be guaranteed by the MVCC mechanism.

MVCC Mechanism

The full name of MVCC is [Multi-Version ConCurrency Control], which is a multi-version control protocol.

MVCC is mainly implemented by adding hidden columns to each row of records and using undo logs. The hidden columns mainly include the version number (incremental) created by the row of data, the deletion time, a pointer to the undo log, etc.

So how does MVCC ensure read-write isolation? It mainly uses two operations: snapshot read and current read.

  • Snapshot read:

To ensure concurrent efficiency, MVCC does not lock when reading data. When executing select (ordinary select without lock), it will first read the version number of the current data. If a transaction modifies this row of data before the select returns a result, the version number will be larger than when the select was executed. Therefore, in order to ensure the consistency of the data read by the select, only data that is less than or equal to the current version will be read. This historical version of data is obtained from the undo log.

  • Currently reading:

When executing insert, update, or delete, the latest version data is read and the current record is locked to ensure that the version number is not modified by other transactions during the operation.

For example, ordinary select is a snapshot read, which means that what is read may be the historical version of the data.

insert, update, delete, select ... lock in share mode, and select ... for update read the current data, that is, they read the latest version of the data.

In fact, setting the isolation level to Serializable can also achieve read-write isolation, but the concurrency efficiency will be much lower, so it is generally rarely used. However, MVCC does not lock when reading, and only locks when writing, thereby improving the concurrency efficiency.

The MVCC mechanism ensures read-write isolation between multiple transactions, thereby achieving transaction isolation.

Consistency (C)

Consistency refers to the consistency of data before and after the transaction is executed. The data integrity before and after the transaction is not destroyed and the data status is legal.

  • The consistency indicators include:

The integrity of the index (unique index, no duplication, etc.), the completion of the data column (field type, length, size meet the requirements), foreign key constraints, etc.

  • Measures to achieve consistency:

Guarantee atomicity, persistence, and isolation. If these characteristics cannot be guaranteed, then consistency cannot be guaranteed either. From the database level, in addition to the guarantees of the above features, there are measures to ensure the consistency of fields. For example, integer characters cannot be passed in, and the length of strings, time formats, etc. cannot exceed the column limit. However, developers themselves need to ensure this at the application level.
For example: if a part of the amount is transferred from A to B, then it must be ensured that the amount deducted from A is the same as the amount added to B. If only the amount from A is deducted without adding the amount from B, consistency cannot be guaranteed.

In addition, MySQL also ensures data consistency between redo log and binlog through two-phase transaction commit.

When introducing persistence above, we explained the difference between redo log and binlog. The third difference mentioned that by default, when a transaction is committed, both redo log and binlog are written. So how do they coordinate consistency? Which log is written to determine whether a transaction is successfully committed?
MySQL ensures the data consistency of the two logs through two-phase commit.

  • First stage submission,

Submit the redo log to disk and change the state to prepare state. Do not perform any operation on the binlog.

  • Second stage submission,

1. Generate a binlog of transaction operations and write the binlog to disk.

2. Call the engine's transaction commit interface to change the redo log status from prepare to commit, and the transaction is committed.
The above two-phase submission ensures the consistency of transaction data.
When the redo log is in the prepare stage when the transaction is committed, if MySQL crashes or crashes, the transaction will be rolled back.
When the transaction commits redo log and is in the commit stage, transaction recovery will be performed if a crash occurs. The local transaction is recovered through redo log. If it is a master-slave database, data will be restored from the slave database based on binlog during the commit stage.
This is the basis for successfully submitting a transaction based on the successful writing of the binlog. Because binlong is usually used for recovery during crash recovery. If binlog has not been generated, only redo log is written. During recovery, the redo log restores a version of the data, while the slave data restored through the bin log will be the binlog version of the data at a previous point in time, which leads to data inconsistency.

Summarize

ACID and consistency of MySQL transactions are the ultimate goals.
Measures to ensure consistency include:

  • AAtomicity: guaranteed by undo log (rollback after exception or execution failure).
  • D Persistence: Guaranteed by redo log (ensuring that when MySQL crashes or there is a power outage, the data can be eventually saved to disk through redo log).
  • Isolation: The read and write operations between transactions are isolated by MySQL's lock mechanism, and the write operations between transactions are isolated by the MVCC mechanism (snapshot read, current read).
  • C consistency: The ultimate goal of a transaction is to ensure consistency at both the database and application levels. In addition, MySQL uses a two-phase commit transaction to ensure transaction consistency during persistence.

The above is the details of how MySQL implements ACID of transactions. For more information about how MySQL implements ACID of transactions, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of the implementation principle of ACID transaction in Mysql
  • MySQL's surprising implicit conversion
  • MySQL not null constraint case explanation
  • Solve the problem of garbled data in MySQL database migration
  • Solution to MySQL connection exception and error 10061
  • MySQL transaction control flow and ACID characteristics

<<:  A brief discussion on the perfect adaptation solution for Vue mobile terminal

>>:  Linux gzip command compression file implementation principle and code examples

Recommend

Writing tab effects with JS

This article example shares the specific code for...

How to run the react project on WeChat official account

Table of contents 1. Use the a tag to preview or ...

JavaScript function call, apply and bind method case study

Summarize 1. Similarities Both can change the int...

Use of Linux relative and absolute paths

01. Overview Absolute paths and relative paths ar...

Linux sar command usage and code example analysis

1. CPU utilization sar -p (view all day) sar -u 1...

6 ways to view the port numbers occupied by Linux processes

For Linux system administrators, it is crucial to...

Summary of MySQL view principles and usage examples

This article summarizes the principles and usage ...

How to customize an EventEmitter in node.js

Table of contents Preface 1. What is 2. How to us...

Detailed explanation of the difference between Vue life cycle

Life cycle classification Each component of vue i...

Vue uses the Element el-upload component to step on the pit

Table of contents 1. Basic Use 2. Image quantity ...

Detailed explanation of the use of React.cloneElement

Table of contents The role of cloneElement Usage ...