Why the table file size remains unchanged after deleting data in MySQL

Why the table file size remains unchanged after deleting data in MySQL

For databases that have been running for a long time, there is often a problem that the table occupies too much storage space. However, after deleting many useless tables, the size of the table file has not changed. To solve this problem, you need to understand how InnoDB reclaims table space.

For a table, the space occupied is mainly divided into two parts: table structure and table data. Generally speaking, table structure definitions take up very little space. Therefore, the space problem is mainly related to the table data.

Before MySQL 8.0, table structures were stored in files with a .frm suffix. In 8.0, it is allowed to define table structures in system data tables.

About the storage of table data

Table data can be stored in a shared tablespace or in a separate file, controlled by innodb_file_per_table .

  • If it is OFF, it means that it exists in the system shared tablespace, together with the data dictionary
  • If ON, each InnoDB table structure is stored in a file with the suffix .idb

After 5.6.6, the default value is ON.

It is recommended to set this parameter to ON, so that when it is not needed, the system will directly delete the file through the drop table command.

But in a shared tablespace, even if the table is deleted, the space will not be reclaimed.

truncate = drop + create

Data Deletion Process

But sometimes when using delete to delete data, only some rows are deleted, but this may result in the table space not being reclaimed.

We know that MySQL InnoDB uses B+ tree as the structure for storing data, which is often called index-organized table, and the data is stored in pages.

When deleting data, there are two situations:

  • Delete some records in the data page
  • Delete the contents of the entire data page

For example, if you want to delete the record R4:

InnoDB directly marks the record R4 as deleted, which is called a reusable location. If a record with an ID between 300 and 700 is subsequently inserted, the position will be reused. It can be seen that the size of the disk file will not be reduced.

Furthermore, the reuse of records is limited to data that meets the scope conditions. If you want to insert a record with ID 800 later, the position of R4 cannot be reused.

For another example, if the contents of the entire data page are deleted, suppose R3 R4 R5 are deleted, which is Page A data page.

At this time, InnoDB will mark the entire Page A as deleted, and the entire data can be reused without any scope restrictions. For example, if you want to insert content with ID=50, you can reuse it directly.

And if the utilization rates of two adjacent data pages are very low, the data in the two pages will be merged into one of the pages, and the other page will be marked as reusable.

In summary, whether a data row or a data page is deleted, it is marked as deleted for reuse, so the file size will not be reduced. The corresponding specific operation is to use the delete command.

Moreover, we can also find that for the first case of deleting records, there will be many gaps due to the range limitation when reusing, such as deleting R4 but inserting ID=800.

Insertion operations also create gaps

When inserting data, if the data is inserted in ascending order of the index, the structure of the index will be compact. However, if it is inserted randomly, it is likely to cause the index data page to split.

For example, insert data into Page A which is full.

Since Page A is full, we need to apply for Page B. The process of adjusting Page A to Page B is also called page splitting.

After the end, there will be a gap in Page A.

In addition, for update operations, deleting first and then inserting will also cause gaps.

Furthermore, for tables that undergo a large number of additions, deletions, and modifications, there may be holes. If the holes are removed, natural space is released.

Use Rebuild Table

In order to remove the gaps in the table, you can rebuild a table B with the same structure as table A, and then insert the data into table B in ascending order of primary key ID.

Since the insertion is sequential, there are naturally no gaps in table B, and the utilization rate of the data pages is also higher. Then, table B was used to replace table A, which seemed to have the effect of shrinking the space of table A.

Specifically through:

alter table A engine=InnoDB

After version 5.5, the command is similar to the process mentioned above, and MySQL will complete the operations of data exchange, table name exchange, and deletion of old tables by itself.

But there is a problem. In DDL, table A cannot be updated. If data is written to table A at this time, data loss will occur.

Online DDL was introduced after version 5.6.

Online DDL

Online DDL has made the following updates based on it:

The process of rebuilding the table is as follows:

  1. Create a temporary file and scan all data pages of the primary key of Table A.
  2. Generate a B+ tree using the generated data pages and store it in a temporary file.
  3. When generating a temporary file, if there is an operation on A, it will be recorded in the log file, corresponding to state 2 in the figure.
  4. After the temporary file is generated, the log file is applied to the temporary file to obtain the same data file as table A, corresponding to state 3.
  5. Replace the data file of the A table with a temporary file.

Since the row log file exists, you can perform DML operations on table A during the reconstruction.

It should be noted that before the alter statement is executed, an MDL write lock is requested first, but it degenerates into an MDL read lock before copying data, thereby supporting DML operations.

As for why the MDL is not removed, it is to prevent other threads from performing DDL operations on this table at the same time.

For large tables, this operation consumes a lot of IO and CPU resources, so when performing online operations, the operation time must be controlled. To ensure safety, it is recommended to use gh-ost for migration.

Online and inplace

First, let's talk about the difference between inplace and copy:

In Online DDL, the reconstructed data of Table A is placed in tmp_file, which is a temporary file created inside InnoDB. The entire DDL is done inside InnoDB. Furthermore, for the server layer, no data is moved to the temporary table. It is an "in-place" operation, so it is called "inplace".

In the previous common DDL, the created table A is created by the server in tmp_table, so it is called "copy"

The corresponding sentence is actually:

-- alter table t engine=InnoDB The default is alter table t engine=innodb,ALGORITHM=inplace;

-- The process is server copying alter table t engine=innodb,ALGORITHM=copy;

It should be noted that inplace and online are not in a corresponding relationship:

  1. If the DDL process is online, it must be inplace
  2. If it is an inplace DDL, it should not be Online, such as adding full-text indexes and spatial indexes in <= 8.0.

expand

Let's talk about the differences between optimize, analyze, and alter table:

  1. alter table t engine = InnoDB (that is, recreate) uses the Oline DDL procedure by default.
  2. analyze table t does not rebuild the table. It only recalculates the index information of the table without modifying the data. During this process, an MDL read lock is added.
  3. optimize table t is equivalent to the operations in the previous two steps.

Using alter table in a transaction will automatically commit the transaction by default to maintain transaction consistency

Sometimes, after rebuilding a table, the space not only does not become smaller, but even becomes a little larger. This is because the rebuilt table itself has no gaps. During the DDL period, some DML executions happened to introduce some new gaps.

InnoDB will not fill the entire table, leaving 1/16 of each page for subsequent updates, so it may be compact at first, but there will be some gaps after the rebuild.

Summarize

Now we know that when using delete to delete data, the corresponding data row is not actually deleted. InnoDB only marks it as reusable, so the table space will not become smaller.

Generally speaking, there are two ways to mark reused space. One is to only mark certain locations in data pages as deleted, but such locations will only be used within a certain range, and gaps will appear.

The other is to mark the entire data page as reusable. Such a data page has no restrictions and can be reused directly.

To solve this problem, we can use the method of rebuilding the table. After version 5.6, the creation of the table already supports online operations, but it is finally used during the low peak of business.

The above is the details of why the table file size remains unchanged after MySQL deletes data. For more information about MySQL table file size, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • MyBatis batch insert/modify/delete MySql data
  • Implementation of mysql using mysqlbinlog command to restore accidentally deleted data
  • How to recover data after accidentally deleting ibdata files in mysql5.7.33
  • MySQL database deletes duplicate data and only retains one method instance
  • How to implement batch deletion of large amounts of data in MySQL large tables
  • Why MySQL does not recommend deleting data
  • Python script to batch delete tens of millions of data in MySQL
  • Mysql delete data and data table method example
  • Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL
  • Why the disk space is not released after deleting data in MySQL

<<:  Analyze the working principle of Tomcat

>>:  Vue's Render function

Recommend

Vue3 (V) Details of integrating HTTP library axios

Table of contents 1. Install axios 2. Use of axio...

Control the light switch with js

Use js to control the light switch for your refer...

Detailed explanation of the role of key in React

Table of contents Question: When the button is cl...

Tutorial on setting up scheduled tasks to backup the Oracle database under Linux

1. Check the character set of the database The ch...

Example analysis of the impact of MySQL index on sorting

This article uses examples to illustrate the impa...

Solve the problem of MySQL using not in to include null values

Notice! ! ! select * from user where uid not in (...

...

Steps for installing MySQL 8.0.16 on Windows and solutions to errors

1. Introduction: I think the changes after mysql8...

Brief analysis of centos 7 mysql-8.0.19-1.el7.x86_64.rpm-bundle.tar

Baidu Cloud Disk: Link: https://pan.baidu.com/s/1...

How to obtain a permanent free SSL certificate from Let's Encrypt in Docker

1. Cause The official cerbot is too annoying. It ...

MySQL 8.0.15 compressed version installation graphic tutorial

This article shares the installation method of My...

The "3I" Standards for Successful Print Advertising

For many domestic advertisers, the creation and ev...

CentOS6 upgrade glibc operation steps

Table of contents background Compile glibc 2.14 M...