A brief analysis of how MySQL implements transaction isolation

A brief analysis of how MySQL implements transaction isolation

1. Introduction

As we all know, when querying data at the RR isolation level of MySQL, it can ensure that the data is not affected by other transactions. However, at the RC isolation level, as long as other transactions are committed, the data after the commit will be read. So what is the principle of transaction isolation? How is this achieved? That must be through the MVCC mechanism ( Multi-Version Concurrency Control ).

Note: The reason why MySQL's InnoDB engine can support high-performance concurrency is due to MySQL's MVCC mechanism (thanks to undo log, Read-View, etc.), but this article does not introduce MVCC in detail.

Reference: "MySQL Practice 45 Lectures" series. Although the explanation is relatively clear, it still requires understanding. For example, I think the part about view arrays is not explained clearly, so I record it in combination with the materials and my own insights!

2. RC and RR isolation levels

We open the RC and RR isolation levels respectively. First, assume that there is an account table. Before transaction ABC is started, the balance in the account is 1, that is,

select balance from account =1; # The result is 1

2.1. Query results under RR transaction isolation level

When three transactions are opened at the RR transaction isolation level, the following operations are performed in different time periods

  • Transaction A (explicitly open the transaction and manually commit it): Query the balance
  • Transaction B (explicitly start the transaction, manually commit): add 1 to the balance of id=1
  • Transaction C (transaction not explicitly started, automatically committed): add 1 to the balance of id=1

We divide the time into three stages logically and analyze the results.

  • Phase 1: Transaction A starts the transaction immediately, followed by Transaction B, and then Transaction C updates the balance to 2 successfully. The current balance = 2.
  • Phase 2: Transaction B updates the value of balance. At this time, it first reads the current latest value of balance as 2, then sets balance = balance + 1 successfully, and the current balance = 3;
  • Phase 3: Transaction A queries the value of balance, which is now 1 (why is it equal to 1? How is it achieved? Shouldn’t it be the current latest value of 3? This is the focus of this blog post). Finally, commit ends the transaction, and then transaction B also commits to end the transaction.

Finally, the result of transaction A reading balance is 1. Naturally, RR stands for repeatable read, that is, the data seen by a transaction during execution is always consistent with the data seen when the transaction starts. Whether the current transaction is committed or not, it will not affect the data. I only need to read data based on the snapshot, which is snapshot read. But what we want to discuss is how to implement it under the MVCC mechanism?

Note: The begin/start transaction command is not the starting point of a transaction. The transaction is actually started after the first statement that operates the InnoDB table is executed. If you want to start a transaction immediately, you can use the start transaction with consistent snapshot command.

2.2. Query results under RC transaction isolation level

Similarly, we start transaction ABC under RC isolation and observe the final balance result of transaction A.

Finally, the result of balance read by transaction A is 2. Naturally, RC stands for read committable, which literally means that as long as other transactions are committed, I can immediately read the latest current value of the current transaction. This is the current read. But what we want to discuss is how to implement it under the MVCC mechanism?

In fact, this is because the consistent read view used in implementing MVCC is used to support the implementation of RC (Read Committed) and RR (Repeatable Read) isolation levels.

3. Implementation of transaction isolation in MVCC

Before discussing how MVCC achieves transaction isolation, we need to know concepts such as view arrays and consistent views to help us better understand how MVCC helps transactions achieve isolation.

3.1. Multiple versions of data rows ROW

Each transaction in InnoDB has a unique transaction ID, called transaction id. It is applied to the InnoDB transaction system at the beginning of the transaction and is strictly incremented in the order of application.

Each row of data also has multiple versions. Each time a transaction updates data, a new data version is generated, and the transaction id is assigned to the transaction ID of this data version, recorded as row trx_id. At the same time, the old data version must be retained, and in the new data version, there should be information to directly obtain it (found through the undo_log file).

That is to say, a row of records in a data table may actually have multiple versions (rows), and each version has its own row trx_id.

Draw the following figure to deepen your understanding of the multi-version control process of a data row ROW that undergoes three update transactions at a certain moment.

From the figure we can get:

  • ROW has four versions V1-V4. After three updates of balance, the latest version is V4. The current balance has been updated to 4, which is the latest value.
  • InnoDB assigns the transaction id generated by each update transaction to row trx_id;
  • Through undo_log, you can roll back from V4 to V1, and find that the balance of V1 is 1, which is the undo_log rollback version.

Understanding the multi-version principle and implementation of data row ROW can help us understand how InnoDB defines and creates snapshots!

3.2 View Array

The following parts are from the original sentences in the materials. Especially the parts in red may be difficult to understand, so you need to combine your own understanding and draw pictures.

This is how InnoDB defines snapshots when a transaction is started. Which transaction operations can I ignore and which ones must be saved in the snapshot? It can be understood as follows: a transaction only needs to declare at the time of startup, "Based on the time I start, if a data version is generated before I start, I will recognize it; if it is generated after I start, I will not recognize it, and I must find its previous version."

In terms of implementation, InnoDB constructs an array for each transaction to save all transaction IDs that are currently "active" at the moment the transaction is started. "Active" means that it has been started but not yet submitted. The minimum value of the transaction ID in the array is recorded as the low water mark, and the maximum value of the transaction ID that has been created in the current system plus 1 is recorded as the high water mark. This view array and the high water mark constitute the consistent view (read-view) of the current transaction.

My understanding of low water level and high water level:

Low water mark = the minimum ID value of all currently started but uncommitted transaction sets = the minimum ID value of the last transaction started but not committed before the current transaction (the minimum ID value of all active transactions)

High water mark = current transaction ID (current ROW version number/row trx_id) = maximum value of transaction ID that has been created + 1

For example: still take the three ABC transactions under the above RR isolation level as an example

  • Before transaction A starts, there is only one active transaction in the system with ID 99;
  • The version numbers of transactions A, B, and C are 100, 101, and 102 respectively, and there are only these four transactions in the current system;
  • Before the three transactions started, the row trx_id of the row of data (id,balance)=(1,1) was 90.

Thus, the view array of transaction A is [99], the view array of transaction B is [99,100], and the view array of transaction C is [99,100,101]. That is, the general formula for the view array is: [{a collection of active transaction IDs at the moment the current transaction is opened}].

The visibility rules of data versions are based on the comparison results of rowtrx_id and consistency view, so we must also understand the consistency view.

3.3. Consistency View

By understanding the view array, consistent view becomes easier, that is, this view array and the high water mark constitute the consistent view (read-view) of the current transaction.

Still taking the three ABC transactions under the above RR isolation level as an example

  • Before transaction A starts, there is only one active transaction in the system with ID 99, so the set of active transactions at the moment transaction A starts is [99];
  • The version numbers of transactions A, B, and C are 100, 101, and 102 respectively, and there are only these four transactions in the current system, so the high watermarks of transactions A, B, and C are 100, 101, and 102 respectively;
  • Before the three transactions started, the row trx_id of the row of data (id,balance)=(1,1) was 90.

In this way, the consistency view of transaction A is [99,100], the consistency view of transaction B is [99,100,101], and the consistency view of transaction C is [99,100,101,102]. That is, the general formula for the consistent view is: [{the set of active transaction IDs at the moment the current transaction is opened}, the current row trx_id].

Analyze the above flowchart results:

The first valid update version is transaction C, which updates balance=2. At this time, the latest version rowtrx_id=102, while the latest version rowtrx_id of the previous active transaction before transaction ABC is 99, so 99 has become historical version 1 at this time;

The second valid update version is transaction B, which updates balance=3. At this time, the latest version rowtrx_id=101, and rowtrx_id=102 becomes historical version 1, and rowtrx_id=99 becomes historical version 2.

When transaction A queries, transaction B has not been submitted, but the generated (id, balance) = (1, 3) has become the latest version. When transaction A reads data, the consistent view is [99, 100]. The read data is cut from the current version and then compared with row trx_id, so there will be the following process:

  • When (1,3) is found, it is determined that row trx_id=101 is larger than the high water mark, is in the red area, and is not visible;
  • Next, find the previous historical version and see row trx_id=102, which is larger than the high water mark and in the red area, so it is not visible.
  • Looking further forward, we finally found (1,1), whose row trx_id = 90, which is smaller than the low water mark and is in the green area, so it is visible.

Finally, no matter when transaction A queries, the data it sees is the snapshot data (1, 1) generated by the consistent view [99, 100], that is, the data when rowtrx_id=90. This is called a consistent read.

Summarize:

For a transaction view, in addition to its own updates being always visible, there are three situations:

  • The version has not been submitted and is not visible;
  • The version has been submitted, but it is submitted after the view is created and is not visible;
  • The version has been submitted and is visible before the view is created.

Now, we use this rule to judge the query results in the figure. The view array of the query statement of transaction A is generated when transaction A is started. At this time:

  • (1,3) has not been submitted yet, belongs to case 1, and is not visible;
  • (1,2) Although submitted, it is submitted after the view array is created, which belongs to case 2 and is not visible;
  • (1,1) is submitted before the view array is created and is visible.

3.4 Current Read and Snapshot Read

3.4.1 Current Read and Snapshot Read Rules

Of course, according to the logic of this consistent read, transaction B is updated after transaction C effectively updates balance=2, but the view array of transaction B is generated in transaction C, so theoretically, shouldn’t transaction B see the data (id, balance)=(1, 1) (snapshot/historical version)? The current version (1, 2) data cannot be seen. Why does the data of transaction B become (1, 3) directly after updating the balance?

If transaction B selects data once before updating, the value it sees is indeed balance=1, but the update cannot be performed on the historical version, otherwise the update of transaction C will be lost. Therefore, the update operation reads the current version first and then updates.

In other words, there is a rule: data is updated by reading first and then updating. The read is to read the latest value, which is called "current read". If you only query without reading, you will read the current snapshot, which is called "snapshot read". So before transaction B updates balance, it first queries the latest version (1, 2) and then updates it to (1, 3). The snapshot data queried by transaction A is (1, 1), not the latest version (1, 3).

3.4.2 Explanation of Current Read and Snapshot Read

Current read: Operations such as select lock in share mode (shared lock), select for update; update, insert, delete (exclusive lock) are all current reads. That is, it reads the latest version of the record. When reading, it must ensure that other concurrent transactions cannot modify the current record, and the read record will be locked.

Snapshot read: An unlocked select operation is a snapshot read, that is, an unlocked non-blocking read. The premise of a snapshot read is that the isolation level is not a serial level. A snapshot read at the serial level will degenerate into a current read. It is based on multi-version control, so the snapshot read may not necessarily read the latest version of the data, but may be a previous historical version (snapshot data).

3.4.3 View rules under RC read commit

The logic of read committed is similar to that of repeatable read. The main differences between them are:

In the repeatable read isolation level, you only need to create a consistent view at the beginning of a transaction, and other queries in the transaction will share this consistent view. In the read committed isolation level, a new view will be recalculated before each statement is executed. In this case, start transaction with consistent snapshot is equivalent to the normal starttransaction/begin. So in the RC isolation level, the data queried by transaction A and transaction B are as follows:

Transaction C immediately updates balance = 2 and then automatically commits it to generate the latest version (1, 2). At this time, the view data (1, 2) is recalculated. Transaction B finds that the latest version is (1, 2) and then updates it to version (1, 3) as the latest version. The balance selected by transaction B at this time is 3 (after transaction B updates balance = 3, a new view is immediately calculated, and the select is the data obtained based on this view), not 1. At this time, transaction B has not been submitted yet and is invisible to transaction A, so transaction A reads the latest version submitted by transaction C (1, 2).

The above is a brief analysis of the details of how MySQL implements transaction isolation. For more information about MySQL transaction isolation, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Description of the default transaction isolation level of mysql and oracle
  • Example of viewing and modifying MySQL transaction isolation level
  • Detailed explanation of Mysql transaction isolation level read commit
  • Analyzing the four transaction isolation levels in MySQL through examples
  • Detailed explanation of the four transaction isolation levels in MySQL
  • Detailed explanation of MySQL database transaction isolation levels
  • Detailed explanation and comparison of the four transaction isolation levels in MySQL
  • In-depth analysis of MySQL transaction isolation and its impact on performance
  • Tutorial on the relationship between Innodb transaction isolation level and lock in MySQL
  • Introduction to MySQL database transaction isolation level (Transaction Isolation Level)

<<:  Detailed description of the function of meta name="" content="

>>:  Details on how to use class styles in Vue

Recommend

Summary of Mysql slow query operations

Mysql slow query explanation The MySQL slow query...

Based on the special characters in the URL escape encoding

Table of contents Special characters in URLs URL ...

This article takes you to explore NULL in MySQL

Table of contents Preface NULL in MySQL 2 NULL oc...

Zookeeper unauthorized access test problem

Table of contents Preface Detect Zookeeper servic...

How to display small icons in the browser title bar of HTML webpage

Just like this effect, the method is also very si...

Basic use of subqueries in MySQL

Table of contents 1. Subquery definition 2. Subqu...

Specific use of Linux which command

We often want to find a file in Linux, but we don...

WeChat applet implements calculator function

This article shares the specific code for the WeC...

Solution to Nginx SSL certificate configuration error

1. Introduction When a web project is published o...

Let's talk about the size and length limits of various objects in MySQL

Table of contents Identifier length limit Length ...

Solution for applying CSS3 transforms to background images

CSS transformations, while cool, have not yet bee...

Solution to the IP address not being displayed under Linux

Table of contents Preface Solution: Step 1 Step 2...

Practical MySQL + PostgreSQL batch insert update insertOrUpdate

Table of contents 1. Baidu Encyclopedia 1. MySQL ...

How to build nfs service in ubuntu16.04

Introduction to NFS NFS (Network File System) is ...

Use CSS variables to achieve cool and amazing floating effects

Recently, I found a fun hover animation from the ...