Summary of MySql index, lock, and transaction knowledge points

Summary of MySql index, lock, and transaction knowledge points

This article summarizes the knowledge points of MySql indexes, locks, and transactions. Share with you for your reference, the details are as follows:

1. Index

An index is similar to the table of contents of a book, and you can immediately find the corresponding record based on a page number in the table of contents.

Advantages of indexing:

  1. Natural sorting.
  2. Quick search.

Disadvantages of indexing:

  1. Takes up space.
  2. Slow down the rate at which you update the table.

Note : It is faster to use full table scan for small tables, and indexes are used for medium and large tables. Super large table indexes are basically ineffective.

In terms of implementation, indexes are divided into two types: clustered indexes and auxiliary indexes (also called secondary indexes or non-clustered indexes)

Functionally, there are 6 types: normal index, unique index, primary key index, composite index, foreign key index, and full-text index.

Let's talk about the 6 types of indexes in detail:

  1. Normal index: The most basic index without any constraints.
  2. Unique index: Similar to a normal index, but with a uniqueness constraint.
  3. Primary key index: A special unique index that does not allow null values.
  4. Composite index: Multiple columns are combined together to create an index, which can cover multiple columns.
  5. Foreign key index: Only InnoDB type tables can use foreign key indexes to ensure data consistency and integrity and implement cascading operations.
  6. Full-text index: The full-text index that comes with MySQL can only be used for InnoDB and MyISAM, and can only perform full-text searches on English. Generally, a full-text index engine (ES, Solr) is used.

Note : The primary key is the unique index, but the unique index is not necessarily the primary key. The unique index can be empty, but there can only be one empty value, and the primary key cannot be empty.

In addition, InnoDB clusters data by primary key. If no primary key and no clustered index are defined, MySql will select a unique non-empty index instead. If there is no such index, a 6-byte primary key will be implicitly defined as the clustered index, which cannot be viewed or accessed by users.

To put it simply:

  1. When you set a primary key, a unique index is automatically generated. If there is no clustered index before, the primary key is the clustered index.
  2. When no primary key is set, a non-empty unique index is selected as the clustered index. If there is none, an implicit 6-byte index is generated.

MySql stores data in pages, and the default page size is 16kb. When you query, it will not only load a certain piece of data, but load all the pages where the data is located into pageCache. This is actually similar to the OS's nearby access principle.

MySql's index uses a B+ tree structure. Before talking about B+ tree, let's talk about B tree first. B tree is a multi-way balanced search tree. Compared with ordinary binary tree, it will not be extremely unbalanced, and it is also multi-way.

The characteristic of B-tree is that it also stores data in non-page child nodes.

As you can see from the picture:

This feature means that non-page child nodes cannot store a large number of indexes.

The B+ Tree optimizes the B tree for this purpose. As shown in the following figure:

We can see that B+ Tree stores all data in leaf nodes, and non-child nodes only store indexes and pointers.

We assume that a non-page child node is 16kb, each index, i.e. primary key is bigint, i.e. 8b, and the pointer is 8b. Then each page can store about 1000 indexes (16kb/8b + 8b).

How many indexes can a 3-layer B+ tree store? As shown below:

Ability to store approximately 1 billion indexes. Usually the height of a B+ tree is 2-4 layers. Since the root node is resident in memory when MySQL is running, each search only requires about 2-3 IOs. It can be said that the design of the B+ tree is based on the characteristics of mechanical disks.

Knowing the design of the index, we can know some other information:

  1. The primary key of MySql cannot be too large. If you use UUID, the non-leaf nodes of the B+ tree will be wasted.
  2. The primary key of MySql is preferably auto-incremental. If UUID is used, the B+ tree will be adjusted for each insertion, which will cause page splits and seriously affect performance.

So, if sharding is used in the project, we usually need a primary key for sharding, so what should we do? In terms of implementation, we can retain the auto-increment primary key and use the logical primary key as the unique index.

2. Locking mechanism

When it comes to MySQL locks, various concepts will emerge. In fact, locks have several dimensions. Let us explain.

1. Type Dimension

  • Shared lock (read lock/S lock)
  • Exclusive lock (write lock/X lock)

Type breakdown:

  • Intention shared lock

  • Intention exclusive (mutual exclusion) lock

  • Pessimistic lock (use lock, i.e. for update)
  • Optimistic locking (using the version number field, similar to the CAS mechanism, that is, the user controls it himself. Disadvantage: when the concurrency is very high, there are many useless retries)

2. Lock granularity (granularity dimension)

  • Table Lock
  • Page Lock (MySQL BerkeleyDB Engine)
  • Row Locks (InnoDB)

3. Lock Algorithm (Algorithm Dimension)

  • Record Lock (single-line record)
  • Gap Lock (locks a range but does not include the locked record)
  • Next-Key Lock (Record Lock + Gap Lock, lock a range and lock the record itself. MySql uses this lock to prevent phantom reads)

4. Is the default read operation locked?

  • By default, the MVCC mechanism ("consistent non-locking read") ensures the correctness of RR-level isolation and is not locked.

You can choose to lock manually: select xxxx for update (exclusive lock); select xxxx lock in share mode (shared lock), which is called "consistent locking read".

After using the lock, phantom reads can be avoided at the RR level. Of course, the default MVCC read can also avoid phantom reads.

Since RR can prevent phantom reads, what is the use of SERIALIZABLE?

Prevent lost updates. For example, the following figure:

At this time, we must use the SERIALIZABLE level for serial reading.

Finally, the implementation principle of row lock is to lock the clustered index. If you do not hit the index correctly when querying, the MySql optimizer will abandon the row lock and use the table lock.

3. Transactions

Transactions are an eternal topic in databases, ACID: atomicity, consistency, isolation, and persistence.

Of the four characteristics, the most important is consistency. Consistency is guaranteed by atomicity, isolation, and persistence.

  • Atomicity is guaranteed by the Undo log. Undo Log saves the records before each change, so that it can be rolled back when an error occurs.
  • Isolation is ensured by MVCC and Lock. I will talk about this later.
  • Durability is guaranteed by Redo Log. Each time before actually modifying the data, the record will be written to the Redo Log. Only when the Redo Log is written successfully will it be truly written to the B+ tree. If there is a power outage before submission, the record can be restored through the Redo Log.

Then let’s talk about isolation.

Isolation Level:

  1. Read Uncommitted (RU)
  2. Read Committed (RC)
  3. Repeatable Read (RR)
  4. serializable

Each level solves different problems, usually 3 problems: dirty reads, non-repeatable reads, and phantom reads. A classic picture:

Here is a point to note about phantom reads. In the database specification, the RR level will cause phantom reads. However, due to MySQL optimization, the RR level of MySQL will not cause phantom reads: when using the default select, MySQL uses the MVCC mechanism to ensure that there will be no phantom reads; you can also use locks. When using locks, such as for update (X lock), lock in share mode (S lock), MySQL will use Next-Key Lock to ensure that there will be no phantom reads. The former is called snapshot read, and the latter is called current read.

Principle analysis:

  • The reason why RU dirty read occurs: The RU principle is to lock the row record of each update statement instead of locking the entire transaction, so dirty read occurs. RC and RR will lock the entire transaction.
  • The reason why RC cannot read repeatedly: RC generates a new Read View each time it executes a SQL statement, and what is read each time is different. The RR transaction uses the same Read View from beginning to end.
  • The reason why RR does not have phantom reads is as mentioned above.

So what is the difference between RR and Serializble? A: Lost updates. The lock part of this article has been mentioned.

MVCC Introduction: The full name is multi-version concurrency control.

Each clustered index of innoDB has 4 hidden fields, namely the primary key (RowID), the transaction ID of the most recent change (MVCC core), the pointer of the Undo Log (isolation core), and the index deletion mark (when deleting, it will not be deleted immediately, but marked and then deleted asynchronously);

Essentially, MVCC is implemented using the Undo Log linked list.

MVCC implementation method: The transaction modifies the original data in an exclusive lock, stores the data before the modification in the Undo Log, and associates the data with the rollback pointer. If the modification is successful, nothing is done. If the modification fails, the data in the Undo Log is restored.

One more thing, we usually think that MVCC is similar to optimistic locking, that is, using version numbers, but in fact, InnoDB is not implemented in this way. Of course, this does not affect our use of MySql.

Readers who are interested in more MySQL-related content can check out the following topics on this site: "Summary of MySQL Index Operation Skills", "Summary of MySQL Common Functions", "Summary of MySQL Log Operation Skills", "Summary of MySQL Transaction Operation Skills", "Summary of MySQL Stored Procedure Skills" and "Summary of MySQL Database Lock-Related Skills".

I hope this article will be helpful to everyone's MySQL database design.

You may also be interested in:
  • Detailed analysis of MySQL index transactions
  • MySQL Database Indexes and Transactions
  • Detailed explanation of transactions and indexes in MySQL database
  • Mysql database advanced usage of views, transactions, indexes, self-connections, user management example analysis
  • MySql knowledge points: transaction, index, lock principle and usage analysis
  • Mysql transaction index knowledge summary

<<:  Writing a web calculator using javascript

>>:  Public free STUN servers

Recommend

15 Best Practices for HTML Beginners

Here are 30 best practices for HTML beginners. 1....

Several CSS3 tag shorthands (recommended)

border-radius: CSS3 rounded corners Syntax: borde...

How to install mysql5.7 in windows

First download the compressed version of mysql, t...

Top 10 Js Image Processing Libraries

Table of contents introduce 1. Pica 2. Lena.js 3....

vite2.x implements on-demand loading of ant-design-vue@next components

1. Use version vite:2.0 ant-design-vue: 2.0.0-rc....

MySQL implements a solution similar to Oracle sequence

MySQL implements Oracle-like sequences Oracle gen...

How to install and deploy ftp image server in linux

Refer to the tutorial on setting up FTP server in...

How is MySQL transaction isolation achieved?

Table of contents Concurrent scenarios Write-Writ...

Detailed explanation of template tag usage (including summary of usage in Vue)

Table of contents 1. Template tag in HTML5 2. Pro...

How to insert video into HTML and make it compatible with all browsers

There are two most commonly used methods to insert...

The difference between mysql outer join and inner join query

The syntax for an outer join is as follows: SELEC...

Several ways to change MySQL password

Preface: In the daily use of the database, it is ...

How to use Docker plugin to remotely deploy projects to cloud servers in IDEA

1. Open port 2375 Edit docker.service vim /lib/sy...

MySQL database development specifications [recommended]

Recently, we have been capturing SQL online for o...