Introduction to MySQL overall architecture

Introduction to MySQL overall architecture

The overall architecture of MySQL is divided into the server layer and the storage engine layer. The server layer includes connectors, query cache, analyzer, optimizer, executor, etc. Stored procedures, triggers, views and built-in functions are all implemented in this layer. The data engine layer is responsible for data storage and retrieval, such as InnoDB, MyISAM, Memory and other engines. After the client connects to the Server layer, the Server will call the interface provided by the data engine to make data changes.

Connectors

Responsible for establishing connections with clients, obtaining user permissions, and maintaining and managing connections.

Use show processlist ; to query the connection status. After a user establishes a connection, even if the administrator changes the permissions of the connected user, it will not affect the connected users. The default connection time is 8 hours, after which the connection will be disconnected.

Briefly describe the long connection:

Advantage: During the connection time, the client always uses the same connection, avoiding resource consumption of multiple connections.

Disadvantages: When MySQL is executed, the memory used is managed by the connection object. Since it is not released for a long time, it will cause system memory overflow and be killed by the system. Therefore, it is necessary to disconnect long connections regularly or disconnect after executing large queries. After MySQL 5.7, you can use mysql_rest_connection to initialize connection resources without reconnecting or doing permission verification.

Query Cache

When a query request is received, the query cache (key/value storage) will be queried to see if it has been executed. If not, follow the normal execution process.

But in actual situations, it is generally not necessary to set up a query cache. Because the cache will be cleared when the table involved in the query is updated. So it is suitable for static tables. After MySQL 8.0, query cache is abolished.

Analyzer

Lexical analysis:

Such as identifying select, table name, column name, and determining whether they exist.

Syntax analysis:

Determine whether the statement conforms to MySQL syntax.

Optimizer

Determine the use of indexes, the order of join tables, etc., and select the most optimized solution.

Actuator

Before executing a statement, a permission check will be performed first. If the permission is passed, the query will be performed using the interface provided by the data engine. If a slow query is set, you will see rows_examined in the corresponding log to indicate the number of rows scanned. In some scenarios (indexes), the executor is called once, but multiple rows are scanned in the data engine, so the number of rows scanned by the engine is not exactly the same as rows_examined .

Reason for not checking permissions in advance: For example, in cases such as triggers, permissions need to be determined in the executor stage and cannot be verified in the optimizer stage.

Use profiling to view SQL execution process

Turn on profiling to analyze the statement execution process:

mysql> select @@profiling;
+-------------+
| @@profiling |
+-------------+
| 0 |
+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

Execute the query statement:

mysql> SELECT * FROM s limit 10;
+------+--------+-----+-----+
| s_id | s_name | age | sex |
+------+--------+-----+-----+
| 1 | z | 12 | 1 |
| 2 | s | 14 | 0 |
| 3 | c | 14 | 1 |
+------+--------+-----+-----+
3 rows in set (0.00 sec)

Get profiles;

mysql> show profiles;
+----------+------------+--------------------------+
| Query_ID | Duration | Query |
+----------+------------+--------------------------+
| 1 | 0.00046600 | SELECT * FROM s limit 10 |
+----------+------------+--------------------------+

mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000069 |
| checking permissions | 0.000008 | Opening tables | 0.000018 | init | 0.000019 | System lock | 0.000010 | optimizing | 0.000004 | statistics | 0.000013 |
| preparing | 0.000094 | Preparing | executing | 0.000016 | Sending data | 0.000120 |
| end | 0.000010 |
| query end | 0.000015 |
| closing tables | 0.000014 |
| freeing items | 0.000032 |
| cleaning up | 0.000026 |
+----------------------+----------+
15 rows in set, 1 warning (0.00 sec)

Query specific statements:

mysql> show profile for query 1;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000069 |
| checking permissions | 0.000008 |
| Opening tables | 0.000018 |
| init | 0.000019 |
| System lock | 0.000010 |
| optimizing | 0.000004 |
| statistics | 0.000013 |
| preparing | 0.000094 |
| executing | 0.000016 |
| Sending data | 0.000120 |
| end | 0.000010 |
| query end | 0.000015 |
| closing tables | 0.000014 |
| freeing items | 0.000032 |
| cleaning up | 0.000026 |
+----------------------+----------+
15 rows in set, 1 warning (0.00 sec)

MySQL Logging Module

As mentioned earlier, MySQL is divided into the server layer and the data engine layer, and each layer also corresponds to its own log file. If the InnoDB engine is selected, the corresponding file is the redo log file. The server layer corresponds to the binlog file. As for why there are two logging systems, let's look down.

redo log

Redo log is a log specific to InnoDB. Why do we need to introduce redo log? Imagine a scenario where MySQL needs to write data to disk files to ensure persistence. We know that when writing to disk, file IO and search operations will be performed. If this is the case for every update operation, the overall efficiency will be extremely low and it will be impossible to use it.

Since writing directly to disk is not possible, the solution is to write it into memory first, and then update it to disk when the system is idle. However, simply updating the memory is not enough. If the system crashes or restarts abnormally, the data in the memory that has not been written to the disk will be lost, and data consistency will become a problem. This is when the redo log comes into play. When an update operation occurs, InnoDb will first write the redo log (recording how the data has changed), then update the memory, and finally write it to the disk at an appropriate time, usually when the system is idle. The operation of writing logs first and then writing to disk is the commonly mentioned WAL (Write-Ahead-Logging) technology.

The emergence of redo log not only greatly improves efficiency, but also ensures that MySQL has crash-safe capabilities, and no data will be lost in abnormal situations.

In the specific implementation, the size of the redo log is fixed. It can be configured as a group of 4 files, each file is 1GB, and the four files are written in a loop during updates.

write pos records the current writing position and moves back after writing. When the end of the 4th file is written, it is rewritten from position 0.

The check point indicates the current position that can be erased. When the data is updated to the disk, the check point moves backward.

The position between write pos and check point is the space where update operations can be recorded. When the write pos catches up with the check point, no new operations can be performed, so let the check point write some data first.

You can set innodb_flush_log_at_trx_commit to 1 to enable redo log persistence.

binlog

Binlog is a log at the server layer, mainly used for archiving, and plays a role in backup, master-slave synchronization, and data recovery. Common log formats include row, mixed, and statement. For specific usage, please refer to Binlog Recovery Log.

You can enable binlog writing to disk by setting sync_binlog=1.

Here we distinguish between binlog and redo:

  1. The owner is different. Binlog is at the server level and can be used by all engines. redo log is unique to InnoDB.
  2. The types are different. Binlog is a logical log that records the original logic of the statement (binlog statement). Redo log is a physical log that records how a data page has been modified.
  3. The data is written in different ways. Binog logs are appended continuously, while redo logs are written in a loop.
  4. Different functions, binlog is used for archiving, while redo log is used to ensure crash-safety.

Two-phase commit

The following is the internal process of the executor and InnoDB when executing Update:

Take the update T set c=c+1 where ID=2; statement as an example:

  1. The executor uses the InooDB engine to find the row where the ID is located, and the ID is the primary key. The engine finds the row through tree search and returns it to the executor if the data page where the row is located is in memory. Otherwise it is read from disk into memory first and then returned.
  2. The executor gets the data from the engine, adds 1 to the C value, waits for a new row, and then rewrites the new data through the engine interface.
  3. The engine updates the row to memory, records the update operation in the redo log, and changes the status of the redo log to prepare. The executor is then told to commit the transaction at the appropriate time.
  4. The executor generates a binlog of this operation and writes the binlog to disk.
  5. The executor calls the transaction commit interface of the engine to change the redo log just written to the commit state, and the update is completed.

Light colors are executed by the executor, and dark colors are executed by the engine.

After updating the memory, writing the redo log is split into two steps: prepare and commit, which is often called a two-phase commit. Used to ensure data consistency when unexpected situations occur.

Let's assume here that what would happen if two-phase commit was not used?

  1. Write redo log first and then binlog. Suppose MySQL restarts abnormally after writing redo log, and binlog is not written at this time. After the restart, since the redolog has been written, there is no problem with the database content. However, at this time, if you want to use binlog for backup or recovery, you will find that the update logic of the last entry is missing, resulting in data inconsistency.
  2. Write binlog first and then redo log. After binlog is written, MySQL restarts abnormally and redo log is not written. After restarting, it is found that the redo log is not written successfully, and the transaction is considered invalid. However, the binlog has an additional update statement. After recovery, the data is naturally inconsistent.

Let's analyze the two-phase submission process:

1. The process crashes during the redo log prepare phase, at time A. After restarting, it was found that the redo log was not written, so the transaction was rolled back.

2. If the system crashes while writing binlog, after restart, it is found that binlog has not been written, so roll back the operation.

3. Binlog is written, but a crash occurs when submitting the redo log commit status

  • If the transaction in the redo log is complete and has a commit flag, commit it directly.
  • If there is only a complete prepare in the redo log, determine whether the corresponding binlog is complete.

Complete, commit the transaction. Incomplete, roll back the transaction.


How to determine whether the binlog is complete?

  • Statement format binlog, with COMMIT; mark
  • The binlog in row format will have an XID event.
  • After 5.6, there is also a binlog-checksum parameter to verify the correctness of binlog.

How to associate redo log and binlog to represent the same operation?

There is a common data field in the structure, XID. During crash recovery, the redo log is scanned sequentially:

  • If there is a prepare and a commit redo log, commit directly.
  • If there is only prepare but no commit redo log, use the XID to find the corresponding transaction in the binlog for judgment.

After the data is written, is there any relationship between the final disk location and the redo log?

  • For a normal running instance, after the page in memory is modified, it is inconsistent with the data page on the disk, which is called a dirty page. The process of flushing data to disk is to write the data pages in memory to disk.
  • In the crash scenario, InnoDB determines whether a data page has lost updates, reads it into memory, and then lets the redo log update the memory content. After the update is completed, the memory page becomes a dirty page and returns to the state of the first case.

What is the relationship between redo log buffer and redo log?

During the update process of a transaction, there are multiple SQL statements, so multiple logs need to be written.
However, during the writing process, the produced logs must be saved first, but cannot be written directly to the redo log before commit.
Therefore, the redo log buffer in the memory is used to store the redo log first. When committing, write the contents of the buffer to the redo log.

Summarize

At the beginning of the article, it is explained that the overall architecture of MySQL is divided into the server layer and the engine layer, and the execution process of a statement is briefly described. Then, MySQL chose InnoDB as the default engine after 5.5 because it has more transaction and crash-safe capabilities than the native MyISAM.

Crash-safe is achieved by redo log. Similar to redo log, there is binlog, which is the log of the Server engine and is used for archiving and backing up data.

Finally, it is mentioned that in order to ensure data consistency, the redo log and binlog are put into the same transaction, which is often referred to as the two-phase commit operation.

The above is the detailed introduction of MySQL overall architecture. For more information about MySQL overall architecture, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MySQL dual-master (master-master) architecture configuration solution
  • In-depth understanding of Mysql logical architecture
  • MySQL 20 high-performance architecture design principles (worth collecting)
  • MySQL 4 common master-slave replication architectures
  • MySQL learning summary: a preliminary understanding of the architectural design of the InnoDB storage engine
  • Summary of MySQL Architecture Knowledge Points
  • Detailed explanation of Mysql logical architecture
  • MySQL database architecture details

<<:  Implementation of Docker deployment of MySQL cluster

>>:  WeChat applet realizes left-right linkage

Recommend

Solve the problem that IN subquery in MySQL will cause the index to be unusable

Today I saw a case study on MySQL IN subquery opt...

Install zip and unzip command functions under Linux and CentOS (server)

Install zip decompression function under Linux Th...

Detailed explanation of CocosCreator project structure mechanism

Table of contents 1. Project folder structure 1. ...

Detailed steps for completely uninstalling MySQL 5.7

This article mainly summarizes various problems o...

Docker image import and export code examples

Import and export of Docker images This article i...

How to import SQL files in Navicat Premium

I started working on my final project today, but ...

Three ways to align div horizontal layout on both sides

This article mainly introduces three methods of i...

Quickly get started with VUE 3 teleport components and usage syntax

Table of contents 1. Introduction to teleport 1.1...

Example of how to configure nginx to implement SSL

Environmental Description Server system: Ubuntu 1...

Several methods of calling js in a are sorted out and recommended for use

We often use click events in the a tag: 1. a href=...

Use three.js to achieve cool acid style 3D page effects

This article mainly introduces how to use the Rea...

How to hide the version number in Nginx

Nginx hides version number In a production enviro...

Detailed explanation of the 14 common HTTP status codes returned by the server

HTTP Status Codes The status code is composed of ...