Can MySQL's repeatable read level solve phantom reads?

Can MySQL's repeatable read level solve phantom reads?

introduction

When I was learning more about database theory, I learned about the possible problems with different isolation levels of things. For a better understanding, we tested and reproduced these problems in the MySQL database. Dirty reads and non-repeatable reads can be easily reproduced under the corresponding isolation levels. However, as for phantom reads, I found that they did not occur under the repeatable read isolation level. At that time, I thought that maybe MySQL did something to deal with phantom reads?

test:

Create a test table dept:

CREATE TABLE `dept` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(20) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8

insert into dept(name) values("Logistics Department")

Thing 1 Thing 2
begin begin
select * from dept
- insert into dept(name) values("R&D Department")
- commit
select * from dept
commit

According to the above process, it is expected that the first select of transaction 1 will query one piece of data, and the second select will query two pieces of data (including the data submitted by transaction 2).

However, in actual testing, it was found that the second select actually only queries and processes one piece of data. However, according to the implementation of repeatable read in database theory (exclusive locks and shared locks), this should not be the case.

Before we understand the actual reasons, let’s review the relevant theories of things.

Database Principles Theory

thing

Transaction generally refers to something to be done or done. In computer terms, it refers to a program execution unit that accesses and possibly updates various data items in a database. A transaction consists of all operations performed between the start of a transaction and the end of a transaction. In a relational database, a transaction can be a set of SQL statements or an entire program.

Why do we have things?

A database transaction typically consists of a sequence of operations that read or write to the database. It exists for two purposes:

  • It provides a method for database operations to recover from failure to a normal state, and also provides a method for the database to maintain consistency under abnormal conditions.
  • When multiple applications access the database concurrently, an isolation method can be provided between these applications to ensure that their operations do not interfere with each other.

Characteristics of things

Transactions have four characteristics: atomicity, consistency, isolation, and durability. These four properties are often referred to as ACID properties.

  • Atomicity:
    A transaction should be an indivisible unit of work, and the operations included in the transaction are either all successful or all unsuccessful.
  • consistency:
    A transaction must change the database from one consistent state to another consistent state. Consistency and atomicity are closely related.
  • Isolation:
    The execution of a transaction cannot be interfered with by other transactions. That is, the operations and data used within a transaction are isolated from other concurrent transactions before the transaction is committed, and concurrently executed transactions cannot affect each other.
  • Durability:
    Once a transaction is successfully committed, the changes it makes to the data in the database should be permanent. Subsequent other operations or failures should not have any effect on it.

Several properties between things are not a set of equivalent concepts:

If there is only one thing at any time, it is naturally isolated, and consistency can be achieved as long as atomicity is guaranteed.

If there is concurrency, atomicity and isolation must be guaranteed to ensure consistency.

Problems in concurrent database transactions

If transaction isolation is not considered, the following problems may occur:

  • Dirty read: Dirty read refers to reading data from another uncommitted transaction during a transaction processing process. When a transaction is modifying a piece of data multiple times, and these modifications have not yet been committed in this transaction, if a concurrent transaction accesses the data, the data obtained by the two transactions will be inconsistent.
  • Non-repeatable read: Non-repeatable read means that for a certain piece of data in the database, multiple queries within a transaction return different data values ​​(here, different means that the content of one or more pieces of data is inconsistent, but the number of data pieces is the same). This is because during the query interval, the data needed by the transaction was modified and committed by another transaction. The difference between non-repeatable read and dirty read is that dirty read is when a transaction reads dirty data that has not been committed by another transaction, while non-repeatable read is when data committed by other transactions is read. Note that in some cases non-repeatable reads are not a problem.
  • Phantom read: Phantom read is a phenomenon that occurs when transactions are not executed independently. For example, transaction T1 modifies a data item in all rows of a table from "1" to "2". At this time, transaction T2 inserts another row of data items into the table, and the value of this data item is still "1" and is submitted to the database. If the user operating transaction T1 checks the data that has just been modified, he will find that there is still a row that has not been modified. In fact, this row was added from transaction T2, which is like an illusion. This is a phantom read. Both phantom reads and non-repeatable reads read another committed transaction (which is different from dirty reads). The difference is that non-repeatable reads may occur in update and delete operations, while phantom reads occur in insert operations.

Exclusive lock, shared lock

Exclusive lock, also known as X lock, write lock.

Shared lock, also known as S lock, read lock.

The relationship between read-write locks is as follows:

  • A transaction adds an S lock to a data object O. It can read O but cannot update it. During the locking period, other transactions can add S locks to O, but cannot add X locks.
  • A transaction adds an X lock to a data object O and can read and update O. During the locking period, other transactions cannot add any locks to O.

That is, the relationship between read-write locks can be summarized as: multiple reads and single write

Isolation level of transactions

There are several isolation levels in transactions:

  • Read Uncommitted: Solve the problem of lost updates. If a transaction has started a write operation, other transactions are not allowed to write at the same time, but other transactions are allowed to read this row of data. This isolation level can be achieved through "exclusive write lock", that is, if a transaction needs to modify certain data, it must add an X lock to the data, and does not need an S lock to read the data.
  • Read Committed: Solve the dirty read problem. Transactions that read data allow other transactions to continue accessing the row of data, but uncommitted write transactions will prohibit other transactions from accessing the row. This can be achieved through "instant shared read lock" and "exclusive write lock", that is, when a transaction needs to modify certain data, an X lock must be added to the data, and an S lock must be added when reading data. When the data is read, the S lock is released immediately without waiting for the transaction to end.
  • Repeatable Read: Non-repeatable reads and dirty reads are prohibited, but phantom reads may sometimes occur. Transactions that read data will prohibit write transactions (but allow read transactions), and write transactions will prohibit any other transactions. Mysql uses this isolation level by default. This can be achieved through "shared read lock" and "exclusive write lock", that is, when a transaction needs to modify certain data, an X lock must be added to the data, and an S lock must be added when reading data. When the data is read, the S lock is not released immediately, but is released after the transaction is completed.
  • Serializable: Solve the problem of phantom reading. Provides strict transaction isolation. It requires that transactions be executed serially. Transactions can only be executed one after another and cannot be executed concurrently. Transaction serialization cannot be achieved simply through "row-level locks". Other mechanisms must be used to ensure that newly inserted data is not accessed by the transaction that has just executed the query operation.

Implementation of isolation levels in MySQL

The above content explains some concepts of database theory, but in databases such as MySQL and ORACLE, for performance reasons, they are not implemented completely according to the theories introduced above.

MVCC

Multi-Version Concurrency Control (MVCC) is a way to implement isolation levels in MySQL based on optimistic locking theory. It is used to implement read committed and repeatable read isolation levels.

Implementation (isolation level is repeatable read)

Before talking about how to implement it, let’s introduce two concepts:

System version number: An increasing number. The system version number will automatically increase every time a new transaction is started.

Transaction version number: The system version number when the transaction starts.

In MySQL, two fields are added after each record in the table:

Create version number: When creating a row of data, assign the current system version number as the creation version number

Delete version number: When deleting a row of data, the current system version number is assigned as the deletion version number

SELECT

The rule for reading data during select is: the creation version number <= the current transaction version number, and the deletion version number is empty or > the current transaction version number.

The creation version number <= the current transaction version number ensures that the retrieved data will not contain data created in transactions started later. This is why we don't find the data added later in the initial example.

The deletion version number is empty or > the current transaction version number, which ensures that the data has not been deleted at least before the transaction is started, and is data that should be checked out.

INSERT

When inserting, assign the current system version number to the creation version number field.

UPDATE

Insert a new record, save the current transaction version number as the row creation version number, and save the current transaction version number to the originally deleted row. In fact, the update here is implemented through delete and insert.

DELETE

When deleting, the current system version number is assigned to the deletion version number field to indicate in which transaction the row of data will be deleted, even if the data is not actually deleted when the commit is performed. The data will not be found even if the data is opened according to the select rule.

Does MVCC really solve phantom reading?

From our initial test examples and the theoretical support above, it seems that the phantom read problem has been solved through MVCC in MySQL. Since serial read seems to be meaningless, we continue testing with doubts.

Pre-test data:

Thing 1 Thing 2
begin begin
select * from dept
- insert into dept(name) values("R&D Department")
- commit
update dept set name="Finance Department" (If you don't want to be fired at work, you must write the where condition)
commit

Based on the above results, we expect the following result:

id name
1 Finance Department
2 R&D Department

But in fact, our experience is:

We originally hoped to change the department of the first data to Finance, but in the end both data were modified. This result tells us that the MySQL repeatable read isolation level does not completely solve the problem of phantom reads, but solves the phantom read problem when reading data. However, the phantom read problem still exists for modification operations, which means that MVCC's solution to phantom reads is not thorough.

Snapshot read and current read

When the above situation occurs, we need to know why it happens. After consulting some information, I found that at the RR level, although the MVCC mechanism makes data repeatable, the data we read may be historical data, not the latest data in the database. This method of reading historical data is called snapshot read, and the method of reading the latest version of the database is called current read.

select snapshot read

When performing a select operation, InnoDB will perform a snapshot read by default, and will record the result of this select. The data of this snapshot will be returned in subsequent selects. Even if other transactions are committed, it will not affect the data of the current select, thus achieving repeatable read. The snapshot is generated when select is executed for the first time. That is to say, suppose A starts a transaction and then does not perform any operations. At this time, B inserts a piece of data and commits it. At this time, A executes select, then the returned data will include the data added by B. It doesn't matter if other transactions are committed afterwards, because the snapshot has been generated and subsequent selects are based on the snapshot.

Current Reading

For operations that modify data (update, insert, delete), the current read mode is used. When executing these operations, the latest records will be read, and even data submitted by other transactions can be queried. Suppose you want to update a record, but the data has been deleted and committed in another transaction. If you update it, there will be a conflict, so you need to know the latest data when updating. It is precisely because of this that the situation we tested above occurred.

The current read of select needs to be manually locked:

select * from table where ? lock in share mode;
select * from table where ? for update;

There is a problem to explain

At the beginning of the test, I thought that using the begin statement was to start a transaction. So in the second test above, because transaction 1 was started first, the newly added data of transaction 2 was found in transaction 1. At that time, I thought that this was inconsistent with the select rules in the previous MVCC, so I did the following test:

SELECT * FROM information_schema.INNODB_TRX //Used to query the currently executing transaction

It can be seen that if you just execute the begin statement, a transaction is not actually started.

Next, add a select statement after begin:

Therefore, it is important to understand that a transaction is actually opened only after operations such as adding, deleting, modifying, and checking the data have been performed.

How to solve phantom reads

Obviously, the repeatable read isolation level cannot completely solve the problem of phantom reads. If we need to solve phantom reads in our project, there are two ways:

  • Using serializable read isolation level
  • MVCC+next-key locks: Next-key locks consist of record locks (index locks) and gap locks (gap locks, which lock not only the data to be used each time, but also the data near these data)

In fact, the above two methods are not used in many projects. The performance of serialized reading is too poor, and in fact, phantom reading is often completely acceptable to us.

Summarize

The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Example of using MySQL transaction features to implement concurrent and safe auto-increment ID
  • Solution to PHP+MySQL high-concurrency locked transaction processing problem
  • Detailed explanation of how MySQL solves phantom reads
  • Mysql transaction concurrency problem solution
  • Detailed explanation of MySQL phantom reads and how to eliminate them
  • MySQL Series 10 MySQL Transaction Isolation to Implement Concurrency Control
  • How to solve the phantom read problem in MySQL
  • mysql+mybatis implements stored procedure + transaction + multi-concurrent serial number acquisition
  • Detailed explanation of concurrent dirty read + non-repeatable read + phantom read in Mysql transactions

<<:  Vue integrates Tencent TIM instant messaging

>>:  Docker installs and runs the rabbitmq example code

Recommend

Summary of common commands for building ZooKeeper3.4 middleware under centos7

1. Download and decompress 1. Introduction to Zoo...

Explanation of the configuration and use of MySQL storage engine InnoDB

MyISAM and InnoDB are the most common storage eng...

How to upgrade MySQL 5.6 to 5.7 under Windows

Written in front There are two ways to upgrade My...

9 Practical Tips for Creating Web Content Pages

Content 1. Give readers a reason to stay. Make the...

Detailed tutorial for upgrading zabbix monitoring 4.4 to 5.0

1. Zabbix backup [root@iZ2zeapnvuohe8p14289u6Z /]...

Solution to Nginx SSL certificate configuration error

1. Introduction When a web project is published o...

uniapp realizes the recording upload function

Table of contents uni-app Introduction HTML part ...

How to view and optimize MySql indexes

MySQL supports hash and btree indexes. InnoDB and...

About 3 common packages of rem adaptation

Preface I wrote an article about rem adaptation b...

Tutorial on using iostat command in Linux

Preface It is said that if the people doing opera...

How to force vertical screen on mobile pages

I recently wrote a mobile page at work, which was...

Detailed explanation of how to install PHP7 on Linux

How to install PHP7 on Linux? 1. Install dependen...