Analysis of the principles of Mysql dirty page flush and shrinking table space

Analysis of the principles of Mysql dirty page flush and shrinking table space

mysql dirty pages

Due to the WAL mechanism, when InnoDB updates a statement, it creates a disk operation called a write log, which is a redo log. After the redo log is written to the memory, it is returned to the client, indicating that the update is successful.

The process of writing data in memory to disk is called flush. Before flush, the actual data is inconsistent with the data in the database because the data has been updated based on the redo log but not yet written, and the database is old. When the content of the memory data page is inconsistent with the disk data page, the memory page is called a dirty page. After the memory is written, it is consistent and is called a clean page.

If MySQL occasionally runs very slowly, it is probably flushing dirty pages. The process of triggering database flush

  • When the redo log is full, the system stops all update operations and pushes the checkpoint forward to make room for further writing.
  • If the system is short of memory and new memory pages are not enough, some data pages will be eliminated and left for other data pages. If dirty pages are eliminated, they will be written to disk first.
  • When mysql is idle.
  • When shutting down mysql normally
  • In the first case, the redo log is full, which is what InnoDB wants to avoid, because the entire system can no longer be updated, which is unacceptable.
  • In the second case, the memory is full and needs to be written to disk first. InnoDB uses the buffer pool to manage memory. There are three states
  • Unused memory pages
  • Used and is a clean page
  • Used and dirty pages (need to be written to disk when eliminated)

So sometimes when we use the database, we may find that the database performance suddenly decreases, which may be due to the processing of dirty pages.

Dirty page flushing control strategy

  • Innodb_io_capacity parameter, this parameter will tell Innodb your disk io capacity. (Calculated with formula)
  • There are two main factors for InnoDB flushing: dirty page ratio and redo log writing speed
  • innodb_max_derty_pages_pct is the upper limit of dirty page ratio, the default is 75%, adjust the Innodb_io_capacity parameter value so that the dirty page ratio does not exceed 75% and shrink the table space

Scenario example: The database occupies too much space, and half of the data in the largest table is deleted, but the size of the table remains unchanged.

Data Deletion Process

If you want to delete R4, the InnoDB engine will only mark the record R4 as deleted. If a record with an ID between 300 and 600 is added later, this position will be reused, but the size of the disk file will not be reduced.

If all records on a data page are deleted, the data page can be reused.

Note: Data page reuse is different from record reuse.

  • For example, if the record R4 is deleted, if a row with ID 400 is inserted, the space can be reused directly. However, if a row with ID 800 is inserted, the position cannot be reused.
  • However, after all records on the entire data page Page A are deleted, pageA is marked as reusable. If a new data page is needed to insert a record with ID=50, PageA can be used reusably.
  • If we use the delete command to delete the entire table data, the result is that all data pages will be marked as reusable, but the file will not become smaller on the disk.

Insert data flow

If data is inserted in index order, the index is compact, but if it is inserted randomly, it will cause paging of the index data pages.

If pageA is already full, what will happen if you insert another row of data? Since A is full, when a data with id 550 is inserted, a new page pageB will be requested to save the data. After the split is completed, a hole is left at the end of pageA.

Updating the value on the index also deletes the old value and inserts a new value, which will also cause a hole.

Shrinking space

Create a new table B with the same structure as table A. Read the data from table A row by row in ascending order of primary key ID and insert it into table B. There will be no holes in table B and the utilization rate of data pages will be higher. If we use table B as a temporary table, after the data is imported from table A to B, replace A with B, which will also shrink A in effect.

During the entire DDL process, table A cannot be updated, so this DDL is not online. In versions after 5.6, the process has been changed:

Create a temporary file and scan all data pages in A.

Generate a B+ tree using the records of A in the data page and store it in a temporary file

Record all operations on A in a log file

After the temporary file is generated, the operations of the log file are applied to the temporary file to obtain a data file with the same logical data as Table A.

Replace the data file of table A with the temporary file

icon

It can be seen that the difference from the process in Figure 3 is that due to the existence of the log file recording and replay operation function, this solution allows addition, deletion, and modification operations on Table A during the process of rebuilding the table.

Use the alter table A engine=InnoDB command to rebuild the table. Prior to MySQL 5.5, the execution process of this command is similar to what we described above. The only difference is that you do not need to create the temporary table B yourself. MySQL will automatically complete the operations of transferring data, exchanging table names, and deleting the old table.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • MySQL Flush-List and dirty page flushing mechanism
  • What are mysql dirty pages?

<<:  Implementing user registration function with js

>>:  Use of Linux date command

Recommend

Implementation of CSS linear gradient concave rectangle transition effect

This article discusses the difficulties and ideas...

Some methods to optimize query speed when MySQL processes massive data

In the actual projects I participated in, I found...

Steps to deploy hyper-V to achieve desktop virtualization (graphic tutorial)

The hardware requirements for deploying Hyper-V a...

Front-end AI cutting tips (experience)

AI image cutting needs to be coordinated with PS....

The difference between hash mode and history mode in vue-router

vue-router has two modes hash mode History mode 1...

Use Angular CDK to implement a Service pop-up Toast component function

Table of contents 1. Environmental Installation 2...

Nginx learning how to build a file hotlink protection service example

Preface Everyone knows that many sites now charge...

Analysis of the causes of accidents caused by Unicode signature BOM

Maybe you are using include files here, which is u...

50 Super Handy Tools for Web Designers

Being a web designer is not easy. Not only do you...

Detailed explanation of Vue3 sandbox mechanism

Table of contents Preface Browser compiled versio...

WeChat applet implements video player sending bullet screen

This article shares the specific code for WeChat ...

Will the index be used in the MySQL query condition?

When an employer asks you whether an index will b...

Analysis of the principles of docker containers

Table of contents 01 What is the essence of a con...