How does MySQL achieve multi-version concurrency?

How does MySQL achieve multi-version concurrency?

MySQL multi-version concurrency

1. Multi-version concurrency control

We know that read uncommitted will cause dirty reads, phantom reads, and non-repeatable reads, read committed will cause phantom reads and non-repeatable reads, repeatable reads may have phantom reads, and serialization will not have these problems.

So how does InnoDB solve these problems? Or, have you ever thought about the underlying and fundamental causes of dirty reads, phantom reads, and non-repeatable reads?

This is the protagonist we are going to talk about today - MVCC (Multi-Version Concurrent Controll), also known as multi-version concurrent control. InnoDB is a storage engine that supports multiple concurrent transactions. It allows read-write operations in the database to be performed concurrently, avoiding read blocking due to locking.

Thanks to MVCC, when transaction B updates the data with id=1, transaction A's read operation with id=1 will not be blocked. The reason behind non-blocking is consistent reading without locking . So what is consistent reading?

1. Consistent read

Simply put, when a query is performed, InnoDB will create a snapshot of the database at the current point in time. After the snapshot is created, the current query can only perceive the transaction changes submitted before the snapshot was created. Transactions submitted after the snapshot was created will not be perceived by the current query.

Of course, data updated by the current transaction itself is an exception. The rows modified by the current transaction can get the latest data when they are read again. For other rows, the version read is still the version at the time of taking the snapshot .

This snapshot is the key to InnoDB's transaction isolation level.

At the (Read Committed isolation level, each consistent read in a transaction regenerates a snapshot. At the Repeatable Read isolation level, all consistent reads in a transaction will only use the snapshot generated by the first consistent read .

This is why, in the above figure, after transaction B commits the transaction, the changes can be seen under the read committed isolation level, but not under the repeatable read isolation level. This is essentially because the read committed snapshot is regenerated .

In the Read Committed and Repeatable Read isolation levels, the SELECT statement will use a consistent read by default, and no locks will be added in the consistent read scenario. Other modification operations can also be performed synchronously, greatly improving the performance of MySQL. This is the implementation principle of MVCC multi-version concurrency control. This type of reading is also called snapshot reading .

What if I want to see the submission of other transactions immediately during a transaction? There are two ways:

(1) Use the read committed isolation level (2) Lock the SELECT statement, either a shared lock or an exclusive lock. More specifically, FOR SHARE and FOR UPDATE
Of course, if the lock added to the corresponding record in the second method is mutually exclusive with the lock added by SELECT , SELECT will be blocked. This type of reading is also known as current reading .

After understanding the above explanation, next time someone asks you how MVCC is implemented, you can explain it from the perspective of consistent reads (snapshot reads) and current reads, and also explain the refresh mechanism for consistent read snapshots under different isolation levels.

But I think it is not enough, I should continue to understand it in depth. Because we only know a snapshot, how is it implemented at the bottom layer? Actually, I still don’t know.

2. In-depth understanding of the principle of consistent reading

It stands to reason that different consistent reads may read different versions of data, so these must be stored in MySQL, otherwise they cannot be read. Yes, these data are stored in the InnoDB tablespace, and more specifically in the Undo tablespace.

The key to implementing MVCC in InnoDB is actually three fields, and each row in the data table has these three fields:

  • The DB_TRX_ID field has 6 bytes and is used to store the unique identifier of the transaction that last inserted or updated the row of data. You may ask, only inserts and updates? What about deleting? In fact, within InnoDB, deletion is actually an update operation , which only updates a specific flag bit in the row to mark it as deleted.
  • The DB_ROLL_PTR field has 7 bytes. You can call it the rollback pointer , which points to a specific Undo Log stored in the rollback segment. Even if the current row of data has been updated, we can still use the rollback pointer to get the historical version data before the update.
  • The DB_ROW_ID field has 6 bytes. It is the unique identifier given by InnoDB to the row of data. This unique identifier will increase monotonically when new data is inserted, just like the monotonically increasing primary key defined when defining the table structure. DB_ROW_ID will be included in the clustered index, but not in other non-clustered indexes .

Through DB_ROLL_PTR, you can get the latest Undo Log, and then each corresponding Undo Log points to its previous Undo Log. In this way, different versions can be connected to form a linked list. Different transactions select different versions from the linked list to read according to requirements and rules, thereby realizing multi-version concurrency control, as shown in the following figure:

Some people may not have any idea about Undo Log, just remember this:

Undo Log records the data status before the transaction started, which is a bit like a commit in Git. You submit a commit, and then start to work on a very complex requirement. Then, you lose your temper and don’t want to make these changes anymore. You can directly git reset --hard $last_commit_id to roll back. The last commit can be regarded as the Undo Log. If you are interested, you can take a look at the MySQL crash recovery process based on Redo Log and Undo Log.

2. Composition of Undo Log

Some people may wonder, shouldn’t the Undo Log be deleted after the transaction is committed? Why can I still check previous data through MVCC?

In fact, in InnoDB, Undo Log is divided into two parts:

  • Insert Undo Log
  • Update Undo Log

For Insert Undo Log, it will only be used for rollback when an error occurs in a transaction, because once the transaction is committed, Insert Undo Log is completely useless, so Insert Undo Log will be deleted after the transaction is committed.

Update Undo Log is different. It can be used for MVCC consistent reads and provide a data source for requests of different versions. So, does this mean that Update Undo Log cannot be removed at all? Because you don't know when a consistent read request will come, causing the space occupied to become larger and larger.

Yes, but not entirely.

Consistent read is essentially to handle multiple transactions concurrently, and different data versions need to be given to different transactions on demand. Therefore, if there is no transaction currently, Update Undo Log can be killed.

This concludes this article about how MySQL achieves multi-version concurrency. This is the end of the article. For more relevant MySQL multi-version concurrency content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed explanation of mysql filtering replication ideas
  • MySQL foreign key (FOREIGN KEY) usage case detailed explanation
  • How to use stored procedures in MySQL to quickly generate 1 million records
  • Python interface automation briefly analyzes the pymysql database operation process
  • MySQL transaction control flow and ACID characteristics
  • Mysql uses stored procedures to quickly add millions of data sample code
  • Implementation of removing overlapping time and calculating time difference in MySQL
  • In MySQL database, datetime, bigint, and timestamp are used to represent time selection. Which one is the most efficient for storing time?
  • Specific use of MySQL global locks and table-level locks
  • Analysis of MySQL crash recovery based on Redo Log and Undo Log

<<:  Front-end performance optimization - the pain points that front-end engineers have to talk about

>>:  Use CSS's clip-path property to display irregular graphics

Recommend

In-depth analysis of Vue's responsive principle and bidirectional data

Understanding object.defineProperty to achieve re...

mysql backup script and keep it for 7 days

Script requirements: Back up the MySQL database e...

Timeline implementation method based on ccs3

In web projects we often use the timeline control...

$nextTick explanation that you can understand at a glance

Table of contents 1. Functional description 2. Pa...

my.cnf (my.ini) important parameter optimization configuration instructions

MyISAM storage engine The MyISAM storage engine i...

Vue implements user login and token verification

In the case of complete separation of the front-e...

React Fragment Introduction and Detailed Usage

Table of contents Preface Motivation for Fragment...

How to configure pseudo-static and client-adaptive Nginx

The backend uses the thinkphp3.2.3 framework. If ...

JS implements the snake game

Table of contents 1. Initialization structure 2. ...

Detailed steps for manually configuring the IP address in Linux

Table of contents 1. Enter the network card confi...

jQuery+Ajax to achieve simple paging effect

This article shares the specific code of jquery+A...