A Deep Dive into the MySQL InnoDB Storage Engine

A Deep Dive into the MySQL InnoDB Storage Engine

Preface

In MySQL, InnoDB belongs to the storage engine layer and is integrated into the database as a plug-in. Starting from MySQL 5.5.8, InnoDB becomes its default storage engine. The InnoDB storage engine supports transactions and is designed mainly for OLTP applications. Its main features include: support for transactions, row lock design to support high concurrency, foreign key support, automatic crash recovery, and clustered index to organize table structure.

System Architecture

The InnoDB storage engine consists of three parts: memory pool, background thread, and disk storage.

Threads

InnoDB uses a multi-threaded model, and there are multiple threads in the background responsible for handling different tasks.

Master Thread

The Master Thread is the core background thread, which is mainly responsible for asynchronously refreshing the data in the buffer pool to the disk to ensure data consistency. Including dirty page refresh, merge insert buffer, UNDO page recycling, etc.

IO Thread

In the InnoDB storage engine, asynchronous IO (Async IO) is used extensively to process write IO requests. The main job of the IO Thread is to be responsible for the callback of these IO requests.

Purge Thread

After a transaction is committed, the undo logs it uses may no longer be needed, so a Purge Thread is needed to reclaim the allocated and used UNDO pages. InnoDB supports multiple Purge Threads, which can speed up the recovery of UNDO pages, increase CPU utilization, and improve storage engine performance.

Page Cleaner Thread

The role of Page Cleaner Thread is to replace the dirty page refresh operation in Master Thread. Its purpose is to reduce the work of the original Master Thread and the blocking of user query threads, and further improve the performance of InnoDB storage engine.

Memory

InnoDB storage engine memory structure

Buffer Pool

The InnoDB storage engine is based on disk storage and manages the records in it in pages. However, due to the gap between CPU speed and disk speed, disk-based database systems usually use buffer pool records to improve the overall performance of the database.

The buffer pool actually uses the speed of memory to compensate for the impact of slow disk speed on database performance. When the database performs a read operation, the page on the disk is first placed in the buffer pool. When the same page is read next time, the page data is first obtained from the buffer pool, which acts as a cache.

For data modification operations, the page data in the buffer pool is first modified, and then refreshed to the disk using a mechanism called Checkpoint.

The size of the buffer pool directly affects the overall performance of the database. For the InnoDB storage engine, the buffer pool configuration is set through the parameter innodb_buffer_pool_size. Use the SHOW VARIABLES LIKE 'innodb_buffer_pool_size' command to view the buffer pool configuration:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size' \G
*************************** 1. row ***************************
Variable_name: innodb_buffer_pool_size
 Value: 134217728
1 row in set (0.01 sec)

The types of data pages cached in the buffer pool include: index pages, undo pages, insert buffers, adaptive hash indexes, InnoDB lock information, data dictionary information, etc. Index pages and data pages occupy a large part of the buffer pool.

Redo log buffer

When the page data in the buffer pool is newer than the disk, the new data needs to be flushed to the disk. InnoDB uses the Write Ahead Log strategy to refresh data. That is, when a transaction is committed, it is first written to the redo log buffer. The redo log buffer will be refreshed to the reset log file at a certain frequency, and then the dirty pages will be refreshed to the disk according to the checkpoint mechanism.

The redo log buffer does not need to be set very large. Usually 8M can meet most application scenarios. The redo log supports the following three situations to trigger refresh:

  • The Master Thread flushes the redo log buffer to the redo log file every second
  • Flushing the redo log buffer to the redo log file each time a transaction commits
  • When the remaining space in the redo log buffer pool is less than 1/2, the redo log buffer is flushed to the redo log file


Additional memory pool

In the InnoDB storage engine, memory is managed through a method called memory heap. When allocating memory for some data structures themselves, it is necessary to apply from an additional memory pool. When the memory in this area is insufficient, it will be applied from the buffer pool.

Lock

The locks supported by InnoDB are:

  • Shared locks and exclusive locks
  • Intention Lock
  • Record Lock
  • Gap Lock
  • Auto-increment lock

Shared locks and exclusive locks

The InnoDB engine implements two standard row-level locks, shared (S) locks and exclusive (X) locks. A shared lock allows a transaction that holds the lock to read a row of data, while an exclusive lock allows a transaction to write a row of records.

If a transaction holds a shared lock, other transactions can still obtain a shared lock for this row of records, but cannot obtain an exclusive lock for this row of records. When a transaction obtains an exclusive lock for a row, other transactions will no longer be able to obtain shared locks and exclusive locks for this row.

Intention Lock

In InnoDB, intention lock is a table-level lock, which is divided into shared lock and exclusive lock:

  • Intentional shared lock: to obtain a shared lock for a row
  • Intention exclusive lock: To acquire an exclusive lock on a row

A transaction must first acquire an intention shared/exclusive lock before acquiring a shared/exclusive lock. The intention lock does not block any other operations on the table. It only tells other transactions that it is going to acquire a shared lock or an exclusive lock for a row.

Record Lock

Record Locks is a lock on an index. It locks the index of a record rather than the record itself. If the current table has no index, InnoDB will create a hidden clustered index for it, and Record Locks will lock this hidden clustered index.

Gap Lock

Gap locks, like record locks, also act on indexes. The difference is that record locks only act on one index record, while gap locks can lock a range of indexes. The only function of gap locks in InnoDB is to prevent other transactions from inserting data, thereby preventing phantom reads.

Auto-increment lock

The auto-increment lock is a special table-level lock that only works on insert operations that contain auto-increment columns. When a transaction is inserting a piece of data, any other transaction must wait for the entire transaction to complete the insertion operation and then acquire the lock to perform the insertion operation.

Transactions

ACID

Transactions are the most important feature of OLTP databases. When talking about transactions, we have to mention the four basic characteristics of ACID:

  • Atomicity: The smallest unit of work in a transaction, either all succeed or all fail
  • Consistency: The integrity of the database will not be compromised after a transaction starts and ends.
  • Isolation: Different transactions do not affect each other. The four isolation levels are RU (read uncommitted), RC (read committed), RR (repeatable read), and SERIALIZABLE (serializable).
  • Durability: After a transaction is committed, changes to the data are permanent and will not be lost even if the system fails.

The atomicity, persistence, and consistency of InnoDB are mainly achieved through the Redo Log, Undo Log, and Force Log at Commit mechanisms. Redo Log is used to recover data in case of a crash, and Undo Log is used to undo the impact of a transaction. It can also be used for multi-version control. The Force Log at Commit mechanism ensures that the Redo Logs are persisted after the transaction is committed. Isolation is ensured by locks and MVCC.

Isolation Level

In MySQL, there are four transaction isolation levels:

  • Read Uncommitted
  • Read Committed
  • Repeatable Read
  • Serializable

Before understanding the four isolation levels, we need to understand three other terms:

  • Dirty Read

Transaction a will read data that transaction b has not yet committed, but transaction b is rolled back for some reason. In this way, the data read by transaction a is unavailable, which will cause some abnormal results.

  • Non-repeatable read

A certain data is queried multiple times during the transaction cycle of a, and at the same time, this data is updated or deleted in transaction b. Then the results of each query of transaction a may be different.

  • Phantom Read

The result of phantom read is actually the same as that of non-repeatable read. The difference is that non-repeatable read mainly performs edit (update) and delete (delete) operations on other transactions. The phantom read is mainly for insert operations. That is, within the life cycle of a transaction, the newly inserted data of another transaction will be queried.

Read uncommitted

Uncommitted read. In this case, a transaction a can see the uncommitted data of another transaction b. If transaction b is rolled back at this time, transaction a will get dirty data, which is the meaning of dirty read.

This isolation level is generally not recommended in MySQL InnoDB.

Read Committed

Read committed: any changes made by a transaction from the beginning until it is committed are not visible to other transactions. The dirty read problem is solved, but phantom reads still exist.

Repeatable Read

Repeatable read: This level ensures that the results of reading the same record multiple times in the same transaction are consistent. It solves both the phantom read and non-repeatable read problems in the InnoDB storage engine.

The InnoDB engine solves the phantom read problem by using Next-Key Lock. Next-Key Lock is a combination of row lock and gap lock. When InnoDB scans index records, it first adds a row lock (Record Lock) to the index records, and then adds a gap lock (Gap Lock) to the gaps on both sides of the index records. After adding the gap lock, other transactions cannot modify or insert records in this gap.

Serializable

Serializable is the highest isolation level. It avoids the problem of phantom reads by forcing transactions to be executed serially. However, Serializable locks each row of data read, which may cause a large number of timeouts and lock contention problems. As a result, concurrency drops sharply. It is also not recommended to use it in MySQL InnoDB.

Open transaction

  • BEGIN, BEGIN WORK, START TRANSACTION

Executing the BEGIN command does not actually start a new transaction at the engine level, but only sets a mark for the current thread to indicate that it is an explicitly started transaction.

  • START TRANSACTION READ ONLY

When a read-only transaction is enabled, when MySQL Server receives any SQL for data changes, it will directly refuse to modify the data and return an error. This error will not enter the engine layer.

  • START TRANSACTION READ WRITE

Allows the super user to start read-write transactions if the read-only state of the current thread is true.

  • START TRANSACTION WITH CONSISTENT SNAPSHOT

Opening a transaction will enter the engine layer and open a readview. This operation is valid only at the RR isolation level; otherwise, an error will be reported.

Undo log

When data is modified, the corresponding undo log will be recorded. If the transaction fails or rolls back, it can be rolled back with the help of the recorded undo log. Undo log is a logical log that records the data image before the change. If the current data needs to be read at the same time during modification, it can analyze the previous version of the data recorded in the row based on the version information. In addition, Undo log will also generate redo log, because Undo log also needs to be persistently protected.

Transaction Commit

  1. Use the global transaction ID generator to generate transaction NO, and add the transaction pointer (trx_t) of the current connection to the global committed transaction list (trx_serial_list)
  2. Mark undo. If this transaction only uses one UndoPage and the usage is less than 3/4 of the Page, mark this Page as TRX_UNDO_CACHED. If it does not meet the requirement and it is an insert undo, mark it as TRX_UNDO_TO_FREE. Otherwise, if the undo is an update undo, mark it as TRX_UNDO_TO_PURGE. Undo marked as TRX_UNDO_CACHED will be recycled by the engine.
  3. Put the update undo into the history list of the undo segment and increment rseg_history_len (global). At the same time, update TRX_UNDO_TRX_NO on the Page, and reset delete_mark if the data is deleted.
  4. Delete undate undo from update_undo_list. If it is marked as TRX_UNDO_CACHED, add it to the update_undo_cached queue.
  5. mtr_commit (log undo/redo is written to the public buffer), so far, the transaction is committed at the file level. At this time, even if the system crashes, the transaction can still be committed after restart. The next thing to do is to update the memory data status (trx_commit_in_memory)
  6. Read-only transactions only need to remove the readview from the global readview list and reset the information in the trx_t structure. The read-write transaction first needs to set the transaction state to TRX_STATE_COMMITTED_IN_MEMORY, release all row locks and remove trx_t from rw_trx_list, and remove readview from the global readview list. If there is an insert undo, remove it here. If there is an update undo, wake up the Purge thread to clean up the garbage. Finally, reset the information in trx_t for the next transaction.

rollback

  • If it is a read-only transaction, it returns directly
  • Determine whether to roll back the entire transaction or part of the transaction. If it is a partial transaction, record how many Undo logs need to be retained and roll back all the excess ones.
  • Find the last undo from update undo and insert undo, and roll back from this undo
  • If it is update undo, the records marked as deleted will be cleared and the updated data will be rolled back to the oldest version. If it is insert undo, delete the clustered index and secondary index directly
  • If all undos have been rolled back or rolled back to the specified undo, stop and delete the Undo log.

index

The InnoDB engine uses B+ tree as the index structure. The data domain of the leaf node of the primary key index stores the complete field data, and the leaf node of the non-primary key index stores the value data pointing to the primary key.

The above figure is a schematic diagram of the InnoDB primary index (which is also a data file). You can see that the leaf node contains complete data records. This type of index is called a clustered index. Because InnoDB's data files themselves are clustered by primary keys, InnoDB requires that tables must have primary keys. If not explicitly specified, the MySQL system will automatically select a column that can uniquely identify data records as the primary key. If such a column does not exist, MySQL will automatically generate an implicit field for the InnoDB table as the primary key. This field is 6 bytes long and of long integer type.

InnoDB's secondary index data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB reference the primary key as the data field. The implementation of clustered index makes the search by primary key very efficient, but the auxiliary index search requires two index searches: first search the auxiliary index to obtain the primary key, and then use the primary key to retrieve the record from the primary index.

Conclusion

This article only introduces a small part of the many features of MySQL InnoDB. Interested students can read "MySQL Technology Insider: InnoDB Storage Engine" to learn more about related knowledge.

Well, the above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • Detailed explanation of the my.ini Chinese configuration scheme for MySql optimization: InnoDB, 4GB memory, and multiple queries
  • Briefly describe the MySQL InnoDB storage engine
  • MySQL learning summary: a preliminary understanding of the architectural design of the InnoDB storage engine
  • MySQL Learning (VII): Detailed Explanation of the Implementation Principle of Innodb Storage Engine Index
  • Summary of the differences between MySQL storage engines MyISAM and InnoDB
  • Detailed analysis of MySQL 8.0 memory consumption
  • Detailed explanation of the usage of MySQL memory tables and temporary tables
  • Summary of MySQL 8.0 memory-related parameters
  • Detailed explanation of how to reduce memory usage in MySql
  • Detailed explanation of memory management of MySQL InnoDB storage engine

<<:  Tutorial on installing MYSQL8.X on Centos

>>:  JS interview question: Can forEach jump out of the loop?

Recommend

Ubuntu 18.04 disable/enable touchpad via command

In Ubuntu, you often encounter the situation wher...

MySQL not null constraint case explanation

Table of contents Set a not null constraint when ...

Detailed process of using nginx to build a webdav file server in Ubuntu

Install nginx Note that you must install nginx-fu...

Introduction to the use of anchors (named anchors) in HTML web pages

The following information is compiled from the Int...

A brief discussion on the placement of script in HTML

I used to think that script could be placed anywh...

Detailed installation and configuration tutorial of PostgreSQL 11 under CentOS7

1. Official website address The official website ...

How does the MySQL database implement the XA specification?

MySQL consistency log What happens to uncommitted...

Detailed explanation of the use of find_in_set() function in MySQL

First, let’s take an example: There is a type fie...

How to build LNMP environment on Ubuntu 20.04

Simple description Since it was built with Centos...

How to use Vue+ElementUI Tree

The use of Vue+ElementUI Tree is for your referen...

Some "pitfalls" of MySQL database upgrade

For commercial databases, database upgrade is a h...

Implementation of Nginx load balancing/SSL configuration

What is load balancing? When a domain name points...