How to solve the phantom read problem in MySQL

How to solve the phantom read problem in MySQL

Preface

We know that MySQL cannot see the content submitted by other transactions under the repeatable read isolation level. Under the committable isolation level, you can see other transactions committing. If our business scenario is that the data we need to see for the same two queries within a transaction are consistent and cannot be affected by other transactions, we use the repeatable read isolation level. In this case, ordinary queries (snapshot reads) at the RR level rely on MVCC to solve the "phantom read" problem. If it is a "current read" situation, what do we need to rely on to solve the "phantom read" problem? That’s what this blog post is about.

Before discussing this, you can read the previous blog post (How does MySQL implement transaction isolation?), which mainly introduces the specific technical details of the isolation level. It may be more helpful to read this article after reading it.

Note: The "phantom reads" discussed in this blog post refer to those performed under the "repeatable read" isolation level.

1. What is phantom reading?

Assume that we have a table t with the following structure, where the initial data lines are: (0,0,0),(1,1,1),(2,2,2),(3,3,3),(4,4,4),(5,5,5)

CREATE TABLE `t`
(
    `id` INT(11) NOT NULL,
    `key` INT(11) DEFAULT NULL,
    `value` INT(11) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `value` (`value`)
)ENGINE = InnoDB;
INSERT INTO t
VALUES (0, 0, 0),
       (1, 1, 1),
       (2, 2, 2),
       (3, 3, 3),
       (4, 4, 4),
       (5, 5, 5)

Assume select * from where value=1 for update, and lock only this row (note that this is just an assumption), and do not lock other rows, then the following scenario will occur:

Session A's three queries Q1-Q3 are all select * from where value=1 for update, querying all rows where value=1.

  • T1: Q1 returns only one row (1,1,1);
  • T2: Session B updates the value of id=0 to 1. At this time, there are two rows of data with value=1 in table t.
  • T3: Q3 returns two rows (0,0,1), (1,1,1)
  • T4: Session C inserts a row (6,6,1). At this time, there are three rows with value=1 in table t.
  • T5: Q3 returns three rows (0,0,1), (1,1,1), (6,6,1)
  • T6: Session A transaction commits.

The phenomenon that Q3 reads value=1 is called phantom read. Phantom read means that when a transaction queries the same range twice, the latter query sees rows that the former query did not see.

First, let’s explain “phantom reading” as follows:

  • Under the repeatable read isolation level, ordinary queries are snapshot reads and will not see data inserted by other transactions. Therefore, phantom reads only appear under "current read" (for update in all three queries indicates current read);
  • The update results of session B above are seen by the select statement after session A using "current read", which cannot be called phantom read. Phantom read only refers to "newly inserted rows".

2. What are the problems with phantom reading?

(1) Need to be solved separately

As we all know, the select ...for update statement locks the corresponding data row. For example, the Q1 query statement of session A at time T1: select * from where value=1 for update locks the data row with value=1. However, if the above scenario occurs, the semantics of for update is destroyed (the data row with value=1 is not locked).

Even if all records are locked, new records cannot be prevented from being inserted, so the "phantom read" problem must be solved separately. It cannot be solved by MVCC or row locking mechanism. This brings us to "gap lock", another locking mechanism.

(2) Concurrency caused by gap locks

The introduction of gap locks may cause the same statement to lock a larger range, which may affect concurrency. Please see the following introduction for details

3. How to solve phantom reading?

The reason for phantom reads is that row locks can only lock rows, but the action of inserting new records updates the "gaps" between records. Therefore, in order to solve the phantom read problem, InnoDB had to introduce a new lock, namely the Gap Lock.

Gap: For example, add 6 records to the table: 0, 5, 10, 15, 20, 25. This results in 7 gaps:

During the row-by-row scanning process, not only row locks are added to the rows, but also gap locks are added to the spaces on both sides of the rows. This ensures that no new records can be inserted.

Gap locks and row locks are collectively called next-key locks. Each next-key lock is an open-first-closed interval (gap lock open interval, next-key lock open-first-closed interval):

There is no conflict between gap locks. The conflict is inserting a record into the gap.

There is no data value=7 in table t, so Q1 adds a gap lock (1,5), and Q2 also adds this gap lock. The two do not conflict with each other and are both to protect the gap from being inserted.

After table t is initialized, assume that the data in the table is as follows:

If you use select * from for update to execute, all records in the entire table will be locked, forming 7 next-key locks, namely (-∞,0], (0,2], (2,4], (4,6], (6,8], (8, 10], (10, +supremum]

The introduction of gap locks may cause the same statement to lock a larger range, which will affect concurrency.

Assume the following scenario:

Then a deadlock obviously occurred, the analysis is as follows:

  • Q1: Execute the select ...for update statement. Since the row with id=9 does not exist, a gap lock (8,10) will be added.
  • Q2: When executing the select ...for update statement, the gap lock (8,10) is also added. There is no conflict between the gap locks, so this statement can be executed successfully.
  • Session B tries to insert a row (9,9,9), but is blocked by the gap lock of session A and has to wait.
  • Session A attempts to insert a row (9,9,9) and is blocked by session B's gap lock.

As mentioned above, the introduction of gap locks may cause the same statement to lock a larger range, which actually affects concurrency.

In order to solve the phantom read problem, the read committable isolation level can be used. The gap lock will only take effect under the repeatable read isolation level. So if the isolation level is set to read committed, there will be no gap locks. But at the same time, if you want to solve the possible inconsistency between data and logs, you need to set the binlog format to row, that is, use the combination of "RC isolation level + log format binlog_format=row".

Conclusion

  • Gap locks are only valid at the RR isolation level, and there are no gap locks at the RC isolation level;
  • To solve the "phantom read" problem at the RR isolation level: "snapshot read" relies on MVCC control, and "current read" is solved by gap lock;
  • Gap locks and row locks are collectively called next-key locks. Each next-key lock is an open-first-closed interval.
  • The introduction of gap locks may cause the same statement to lock a larger range, affecting concurrency.

This is the end of this article on how to solve the phantom read problem in MySQL. For more information about MySQL phantom reads, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Example of using MySQL transaction features to implement concurrent and safe auto-increment ID
  • Solution to PHP+MySQL high-concurrency locked transaction processing problem
  • Can MySQL's repeatable read level solve phantom reads?
  • Detailed explanation of how MySQL solves phantom reads
  • Mysql transaction concurrency problem solution
  • Detailed explanation of MySQL phantom reads and how to eliminate them
  • MySQL Series 10 MySQL Transaction Isolation to Implement Concurrency Control
  • mysql+mybatis implements stored procedure + transaction + multi-concurrent serial number acquisition
  • Detailed explanation of concurrent dirty read + non-repeatable read + phantom read in Mysql transactions

<<:  Example of using nested html pages (frameset usage)

>>:  JavaScript exquisite snake implementation process

Recommend

Use of MySQL query rewrite plugin

Query Rewrite Plugin As of MySQL 5.7.6, MySQL Ser...

How to configure nginx to return text or json

Sometimes when requesting certain interfaces, you...

Docker's flexible implementation of building a PHP environment

Use Docker to build a flexible online PHP environ...

How to uninstall MySQL 5.7.19 under Linux

1. Find out whether MySQL was installed before Co...

Understanding JSON (JavaScript Object Notation) in one article

Table of contents JSON appears Json structure Jso...

js implements a simple shopping cart module

This article example shares the specific code of ...

Linux parted disk partition implementation steps analysis

Compared with fdisk, parted is less used and is m...

mysql indexof function usage instructions

As shown below: LOCATE(substr,str) Returns the fi...

Tips and precautions for using MySQL index

1. The role of index In general application syste...

What does mysql database do

MySQL is a relational database management system ...

How to Learn Algorithmic Complexity with JavaScript

Table of contents Overview What is Big O notation...

How to use Docker to build a development environment (Windows and Mac)

Table of contents 1. Benefits of using Docker 2. ...

Detailed explanation of the interaction between React Native and IOS

Table of contents Prerequisites RN passes value t...

How to use worker_threads to create new threads in nodejs

Introduction As mentioned in the previous article...