MySQL 8.0: MVCC for Large Objects in InnoDB In this article, I will explain the Multi-Version Concurrency Control (MVCC) design for Large Objects (LOBs) in MySQL InnoDB storage engine. MySQL 8.0 has a new feature that allows users to partially update large objects, including JSON documents. With this partial update feature, the way MVCC works for LOBs has changed when a LOB is partially updated. For normal updates (full updates), MVCC will work just like previous versions. Let's look at how MVCC works when partial updates are not involved, and then consider the use case of partial updates to LOBs. MVCC General Updates I use the term regular updates to refer to updates that are not partial updates. I will explain how MVCC can be used to routinely update large objects with an example. I will use the following mtr(1) test case for this purpose: create table t1 ( f1 int primary key , f2 longblob ) engine = innodb ; insert into t1 values ( 1 , repeat ( 'a' , 65536 ) ) ; start transaction; update t1 set f2 = repeat ( 'b' , 65536 ) where f1 = 1 ; -- echo # Connection con1: -- For users who use the MySQL client, you may need to establish a new link by opening another terminal window, the same below. connect ( con1 , localhost , root , , ) ; -- echo # Must see the old value 'aaaaaaaaaaa' select f1 , right ( f2 , 10 ) from t1 order by f1 ; -- echo # Connection default: connection default; disconnect con1; commit ; drop table t1 ; In order to understand the following explanation, it is very important to understand the above test case carefully. The test scenario is as follows:
Table t1 contains only one record (R1). But trx1 and trx2 will see two different values. The table actually contains only the latest values (the values seen by trx1), while the values or records seen by trx2 are obtained from the undo log records. Let's look at the picture below to understand it better. Initial state: before update operation The following figure shows the situation before the update operation. The undo log is empty. The clustered index of a table contains one row. There is a LOB in the table. The clustered index record contains a reference to the LOB. Final state: After the update operation Now let's look at what happens after the update operation. Here are some important observations:
The steps taken by the transaction in connection 1 are as follows:
This is how MVCC for LOBs works when no partial updates are involved. MVCC Partial Update Let's look at another example to understand how MVCC works in the case of partial updates. We need another example because partial updating of JSON documents is currently only supported through the functions json_set() and json_replace(). create table t2 ( f1 int primary key , j json ) engine = InnoDB ; set @ elem_a = concat ( '"' , repeat ( 'a' , 200 ) , '"' ) ; set @ elem_a_with_coma = concat ( @ elem_a , ',' ) ; set @json_doc = concat ( "[" , repeat ( @elem_a_with_coma , 300 ) , @elem_a , "]" ) ; insert into t2 ( f1 , j ) values ( 1 , @ json_doc ) ; start transaction; update t2 set j = json_set ( j , '$[200]' , repeat ( 'b' , 200 ) ) where f1 = 1 ; -- echo # Connection con1: connect ( con1 , localhost , root , , ) ; -- echo # Must see the old value 'aaaaaaaaaaa...' select json_extract ( j , '$[200]' ) from t2 ; -- echo # Connection default: connection default; disconnect con1; commit ; The scenario is the same as the previous example. Just the longblob field has been changed to a JSON document. The data loaded is also slightly different to conform to the JSON format. Tip: You can add the statement set debug = '+d, innodb_lob_print' to the above mtr test cases (in both) to print the LOB indexes in the server log file. The LOB index will be printed immediately after insertion. The LOB index will give you the structure of the stored LOB object. The initial conditions before the full or partial update operation are the same and have been given above. But in the figure below, some additional information is provided. Let's look at some other information shown in the graph:
Additional information is the LOB version number. This is available in the LOB reference in the clustered index record and in each entry of the LOB index. After a partial update operation The following diagram illustrates the situation after a partial update operation. The most important optimization here is that there is still only one LOB in the user tablespace. Update only those LOB data pages that need to be modified. This single LOB after the partial update operation contains both the old and new versions of the LOB. This is illustrated by the v1 and v2 labels on the LOB data page in the figure. Another important observation is that the LOB references in the undo log and clustered index records point to the same LOB. But the LOB reference contains a different version number. The LOB reference in the undo log record contains v1 (the old version number), and the LOB reference in the clustered index record contains the new version number v2. Purpose of LOB version numbers As shown above, different LOB references with different version numbers point to the same LOB. A single LOB contains parts from different versions. The LOB version number is used to get the correct version that the various LOB references point to. In this section, we will see how this is done. A LOB index contains a list of LOB pages that make up a LOB. It contains the page number of the LOB data page, the amount of data contained in each LOB data page, and a version number. Each node of this list is called a LOB index entry. Each LOB index entry contains a list of old versions. Let's look at a diagram illustrating the structure of the above partial update test case. Initially, before the partial update is done, the LOB index contains a total of 4 entries. The page numbers of the four entries are 5, 6, 7, and 8. No LOB index entry has an old version. All four entries have a version number of 1. After some of the updates were completed, we noticed that page 9 had replaced page 7, and page 7 was now considered an older version of page 9. Page number 9 has version number 2, and page number 7 has version number 1. After the partial update is complete, when the LOB is accessed through a LOB reference with version number 1, the first index entry on page 5 is looked at. It has a version number of 1. If the version number in the index entry is less than or equal to the version number in the LOB reference, that entry will be read. Therefore, page 5 will be read. Then the index entry for page 6 will be viewed. It has version number 1, so that will be read. Then the index entry for page number 9 will be viewed. It has version number 2. But the lob reference has version number 1. If the version number in the index entry is greater than the version number in the LOB reference, the entry will not be read. Since the entry for page number 9 has version 2, its older version will be viewed. The index entry with page number 7 will be checked. It has version number 1, so that will be read. After this, the index entry with page number 8 is checked. It has version number 1, so it will also be read. This is how you access legacy LOBs. After the partial update is complete, when the LOB is accessed through the LOB reference with version number 2, the first index entry on page 5 is looked at. It has a version number of 1. If the version number in the index entry is less than or equal to the version number in the LOB reference, that entry will be read. So it will read page numbers 5,6,9,8 in sequence. Since the version number is always <= 2, there is no need to use an older version to access page number 7. One thing to remember is that LOBs do not exist independently in InnoDB. It is considered an extension of the clustered index record. Whether a LOB is visible to a transaction is not handled by the LOB module. The LOB module only processes clustered index records. If a transaction accesses a LOB, it means that it has determined with the help of the DB_TRX_ID in the clustered index record that it can look at the LOB (not a specific version of the LOB). So we don't worry about that aspect in the LOB module. We focus only on providing the correct content for a given LOB version number. in conclusion In this article, we saw how MVCC is done in InnoDB for large objects. When a LOB is partially updated, multiple LOB references can point to the same LOB. But they will have different version numbers. Using these LOB version numbers, the correct LOB content can be accessed. Hope you found this information useful. Thank you for using MySQL! Notes: (1) Mtr is the abbreviation of Mini-transaction, which literally means small thing. Compared with logical things, we call it physical things. Belongs to the underlying module of the Innodb storage engine. Mainly used for locking and logging information. Summarize The above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support for 123WORDPRESS.COM. You may also be interested in:
|
>>: Problems encountered when uploading images using axios in Vue
This article example shares the specific code of ...
We all know that the underlying data structure of...
Data backup and restoration part 2, as follows Ba...
This article example shares the specific code of ...
Just like code, you can add comments to tables an...
GNU Parallel is a shell tool for executing comput...
Event response refresh: refresh only when request...
We all have files stored on our computers -- dire...
This article introduces the CSS scrollbar selecto...
Starting from MySQL 8.0.16, you can set a passwor...
JBoss uses Tomcat as the Web container, so the co...
Preface MRR is the abbreviation of Multi-Range Re...
Setting min-width and max-width properties in tab...
Table of contents 1. Traversal Class 1. forEach 2...
Preface The location in the server block in the N...