The importance of data consistency and integrity to online business is self-evident. How can we ensure that data is not lost? Today we will discuss what improvements MySQL has made regarding data integrity and strong consistency. 1. MySQL two-phase commit In relational databases such as Oracle and MySQL, the write-ahead logging strategy is important. As long as the logs are persisted to disk, data can be guaranteed not to be lost after an abnormal restart of MySQL. In MySQL, when talking about logs, we have to mention redo log and binlog. 1. redo log Redo log, also known as redo log file, records in detail the modifications to the data rows in each data page, and records the values after the data is modified. Redo log is used for database crash recovery and is one of the most important functions to ensure data security. The redo log is written sequentially and cyclically. The file size and number of redo logs are controlled by the two parameters innodb_log_file_size and innodb_log_files_in_group. Before writing redo log to disk, it is first written to the redo log buffer, the size of which is controlled by innodb_log_buffer_size. How are logs persisted to disk after being written to the redo log buffer? In order to control the writing strategy of the redo log, Innodb adopts different strategies according to the different values of the innodb_flush_log_at_trx_commit parameter. It has three different values:
Among the three modes, 0 has the best performance, but it is not safe. Once the MySQL process crashes, one second of data will be lost. Option 1 has the highest security but has the greatest impact on performance. Option 2 mainly controls the disk flushing time by the operating system. If only MySQL crashes, it will not affect the data. If the host crashes abnormally, data will also be lost. 2. binlog binlog, also known as binary log, records all operations that change the MySQL database, excluding select and show operations. It mainly serves the functions of recovery, replication, and auditing. Binlog formats mainly include statement, row, and mixed. Statement: SQL statements based on operations are recorded in binlog. It is not recommended. Row: Based on the record of row changes, it will record the content before and after the row change. The row mode is also an important guarantee for the database to prevent data loss, and it is recommended to use it. Mixed: A mixture of the first two modes. It is not recommended. The writing logic of Binlog is also relatively simple: during the transaction execution, it is first written into the binlog cache, and then written into the binlog file when the transaction is committed. The binlog cache is controlled by the binlog_cache_size and max_binlog_size parameters. Each thread is assigned a binlog cache, but they share the binlog file. Binlog's mechanism for writing log files is controlled by sync_binlog:
Setting both innodb_flush_log_at_trx_commit and sync_binlog to 1 is the classic double-one mode in MySQL data, which ensures that the database does not lose data. MySQL data adopts the WAL mechanism to reduce the performance impact caused by each dirty data flush. If the "double one" strategy is set, will it affect the performance of the database? In fact, this is mainly due to the fact that both redo log and binlog are written sequentially. The sequential writing speed of the disk is much faster than the random writing speed. In addition, the group commit mechanism inside MySQL has greatly reduced the IOPS consumption of the disk. 3. Two-phase commit MySQL introduces two phase commit (2pc). MySQL will treat ordinary transactions as XA transactions (internal distributed transactions) and automatically assign a unique ID (XID) to each transaction. COMMIT will be passively divided into two phases: Prepare and Commit. Phase 1: Transaction Prepare Phase At this point, the SQL has been successfully executed, and xid information and redo and undo memory logs have been generated. Then call the prepare method to complete the first phase, set the transaction status to TRX_PREPARED, and flush the redo log to disk. Phase 2: Commit Phase If the first phase of the transaction enters the prepare phase, the generated binlog will be written to the file and flushed to disk. At this time, the transaction is bound to be committed. Specific abnormal scenario analysis: 1. If a transaction crashes in the prepare phase and the database is not written to the binary log and the storage engine is not committed during recovery, the transaction will be rolled back. 2. When a transaction crashes in the binlog phase, the log has not been successfully written to the disk. The transaction will be rolled back when it is started. 3. When a transaction crashes after the binlog has been fsync()ed to disk, but InnoDB has not had time to commit, the MySQL database will read the Xid_log_event of the binary log during recovery and tell InnoDB to commit the transactions with these XIDs. After InnoDB commits these transactions, it will roll back other transactions to keep the storage engine and the binary log consistent. MySQL's two-phase commit ensures that data is not lost after the database is restarted due to an abnormal shutdown. 2. Double Write As we said before, redo log, binlog, and two-phase commit ensure that data can be recovered through rollforward and rollback after an abnormal restart of MySQL. MySQL uses redo log for recovery. The redo log records the physical operations on the page, but there is a problem. If the page itself is wrong, such as a partial write problem (the page size is 16K. Suppose that when writing a dirty page in the memory to the database, 4K is written and the power is suddenly cut off. That is, the first two 4K are new and the last 12K are old, so this data page is incomplete and is a bad data page), the redo recovery will verify the integrity of the data page. At this time, the data page is already damaged, so the redo log cannot be used for recovery, and the data is lost. Double Write Principle: 1. When refreshing the dirty pages of the buffer pool, they are not written directly to the data file, but are first copied to the double write buffer. 2. Then write the double write buffer to the shared table space on disk twice, 1 MB each time. 3. Finally, write the data file from the double write buffer. Although data is always written twice, double writes are written sequentially, which actually sacrifices about 10% of system performance. This can solve the problem of partial write failure mentioned above, because there is a copy of the data page in the shared tablespace on disk. If the database crashes in the process of writing the page to the data file, when the instance is restored, the page copy can be found in the shared tablespace, copied to overwrite the original data page, and then the redo log can be applied. 3. Summary Today we talked about MySQL's two-phase commit and double write mechanisms, which respectively solved how MySQL can prevent data loss in the scenarios of MySQL crashing and restarting and partial page writing. So what should we do if our operating system crashes and cannot be started? What optimizations has MySQL made in the cluster architecture to ensure that data is not lost? In the next chapter, we will share with you the optimization and improvements of MySQL in the cluster architecture. You may also be interested in:
|
<<: Implementation of nginx flow control and access control
>>: Vue.js implements tab switching and color change operation explanation
Table of contents 1. Installation 2.APi 3. react-...
Table of contents 2. Comma operator 3. JavaScript...
1. Background Recently, some friends encountered ...
Configuring network connectivity for Linux system...
Table of contents Preface Creation steps Create a...
Beginners who are exposed to HTML learn some HTML...
Nginx's configuration syntax is flexible and ...
1. Install Docker First open the Linux environmen...
1. Install the cross-system file transfer tool un...
Table of contents Preface Optional Chaining Nulli...
MySQL replication table detailed explanation If w...
mycli MyCLI is a command line interface for MySQL...
The system environment is server2012 1. Download ...
Today we will make a simple case, using js and jq...
1. Download address: http://dev.mysql.com/downloa...