A brief discussion on the correct approach to MySQL table space recovery

A brief discussion on the correct approach to MySQL table space recovery

I don’t know if you have ever encountered such a situation. Online business performs add, delete, modify and query operations on MySQL tables. As time goes by, the data in the table becomes more and more, the table data file becomes larger and larger, and the space occupied by the database naturally increases gradually.

In order to reduce the space occupied by the table data file on the disk, we used the delete command to delete half of the old data in the largest business table. After the deletion, the table data file on the disk did not shrink. Even if the data of the entire table was deleted, the file still did not become smaller. Why is this?

This article will analyze the above problems in detail and give the correct method to reclaim table space

Preliminary Notes

Currently, most MySQL databases use the InnoDB engine, so unless otherwise specified, the examples in this article are based on the InnoDB engine.

In the MySQL configuration, there is a configuration item called innodb_file_per_table . After setting it to 1,
The data of each table will be stored separately in a file with the suffix .ibd

If innodb_file_per_table is not enabled,
The data of the table is stored in the shared table space of the system, so even if the table is deleted, the shared table space will not release this space

Therefore, usually, the innodb_file_per_table option is set to 1. At the same time, in order to intuitively see the size changes of the table data file, the examples in this article are also explained based on turning on this option.

Problem Reproduction

Create a new table ta , the structure of the table is as follows

mysql> show create table ta\G
*************************** 1. row ***************************
       Table:
Create Table: CREATE TABLE `ta` (
  `id` int(11) NOT NULL,
  `ia` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Use the following stored procedure to batch insert data into ta

delimiter //
create procedure multinsert(in beg int, in cnt int)
begin
	declare icnt int default 0;
	declare tmp int default 0;
	while icnt < cnt do
		set icnt = icnt + 1;
		set tmp = beg + icnt;
		insert into ta(id,ia) values(tmp,tmp);
	end while;
end//

delimiter ;

Execute the call multinsert(0,100000) command in the MySQL console to insert 100,000 records into the ta table.

mysql> call multinsert(0,100000);
mysql> select count(*) from ta;
+----------+
| count(*) |
+----------+
| 100000 |
+----------+
1 row in set (0.02 sec)

Check the size of the ta.ibd data file of ta table on disk

[root@ecs-centos-7 test]# cd /var/lib/mysql/test/
[root@ecs-centos-7 test]# ls -l ta.ibd 
-rw-r----- 1 mysql mysql 11534336 January 3 23:14 ta.ibd

From the above results, we can know that after inserting 100,000 records into the ta table, the size of ta.ibd is 11534336 bytes (about 11M)

Now we use the delete command to delete half of the table data (50,000 rows)

mysql> delete from ta where id between 1 and 50000;
Query OK, 10000 rows affected (0.03 sec)
mysql> select count(*) from ta;
+----------+
| count(*) |
+----------+
| 50000 |
+----------+
1 row in set (0.02 sec)

After the deletion is complete, check the size of ta.ibd on the disk again.

[root@ecs-centos-7 test]# cd /var/lib/mysql/test/
[root@ecs-centos-7 test]# ls -l ta.ibd 
-rw-r----- 1 mysql mysql 11534336 January 3 23:14 ta.ibd

From the above results, we can know that after deleting half of the ta table, that is, 50,000 rows of data, the size of ta.ibd is 11534336 bytes (about 11M)

That is to say, the data file on the disk does not shrink before and after the data is deleted from ta table.

To understand why the data file is not shrinking, we need to understand the principle of deleting data.

Deleting data principle

As we all know, data in InnoDB is organized using B+ trees. For more information about B+ trees, please refer to Understanding B+ Trees.

Figure (1)

The above is a schematic diagram of InnoDB indexes. The nodes framed by dotted lines belong to Page1 data page. The leaf nodes store the data corresponding to the index. They form an ordered array in the order of the index from small to large.

Suppose we want to delete the data with index key value 13 in Page1, which is the red part in the above figure.

The InnoDB engine will mark the node with index key value 13 as deleted. It will not reclaim the actual physical space of the node, but only mark it as a deleted node, which can be reused later. Therefore, deleting table records will not shrink the data file on the disk.

You may say that the above only deletes the data of one node in Page1, so if all the node data in Page1 is deleted, the space of Page1 should be reclaimed, right?

The answer is, no recycling

When all the data in Page1 is deleted, the entire data page will be marked as deleted and the entire data page can be reused. Therefore, in this case, the data file on the disk will not be reduced.

Data reuse

Data reuse involves operations such as inserting, deleting, and transferring data nodes and merging data pages. For details on the specific operation process, please refer to Understanding B+ Trees. I will not repeat it here.

  • Data node reuse

In the above figure (1), when the node with index key value 13 is deleted, this node is marked as reusable.

If a record with an index key value between 7 and 18 is inserted later, the data node with the original index key value of 13 will be reused.

However, if the index key value of the record inserted later is not between 7 and 18, the data node with the original index key value of 13 may not be reused.

In other words, the reuse of data nodes requires that the index key value meets certain range conditions.

  • Data page reuse

In Figure (1), after deleting all data nodes of Page1, the entire Page1 is reusable. When the inserted record needs to use a new page, Page1 can be reused.

When the utilization of adjacent data pages is low, they may be merged into one of the data pages. At this time, the other data page becomes vacant, and the vacant data page becomes reusable.

Which operations will cause data holes?

After we delete a record with the delete command, InnoDB simply marks the corresponding data node as deleted and reusable. These empty data nodes waiting to be used can be regarded as data holes one by one.

  • Deleting Data

When deleting data, data holes will be created. This has been explained before, so I won’t repeat it here.

  • Inserting Data

If the data is inserted in the order of index size, the data page is compact and no data holes will appear.

If you insert data from the middle of an index, it may cause page splits, and data holes may appear on the pages after the splits. The following figure shows an example of a page split caused by an insertion.

As shown in the figure, the leaf page is full before the split, and the data is arranged very compactly.

Now a data with index key value 15 is inserted. After the insertion, Page1 is split into two pages Page1,Page2 in the figure above.

After the split, two holes appeared in Page1. These two data nodes are reusable, while Page2 is full.

  • Update data

Updating data can be seen as deleting first and then inserting, which may also cause data holes

For example: id is the primary key of table ta, update ta set update ta set id = 10 where id = 1 statement changes id = 1 to id = 10 , which is equivalent to deleting the record with id = 1 first and then inserting the record with id = 10 This will result in data gaps.

However, if the statement is like update ta set ia = ia + 1 where id = 1 which does not change the primary key value, it will not cause a hole.

Therefore, updating data may cause data gaps

In summary, table addition, deletion and modification operations may cause data holes, and online services will perform a large number of addition, deletion and modification operations on tables, so the possibility of data holes is relatively high.

How to shrink a tablespace

Since a table will have a lot of data holes after a lot of irregular addition, deletion and modification operations

What if we create a new table with the same structure as the original table with data holes, and then insert the data from the old table into the new table in ascending order of index, after all the data from the old table is inserted into the new table, delete the old table, and then rename the new table to the name of the old table?

Since the leaf node data in the new table is added in sequence, the page is very compact and the page utilization rate is very high. The required pages are much fewer than those in the old table. In this way, the holes in the indexes in the old table do not exist in the new table. The disk space occupied by the new table data file will naturally be reduced, thus achieving the purpose of shrinking the table space.

The following are several methods for shrinking table space. Although the methods are different, the basic principle is to achieve the purpose by rebuilding the table.

  • truntace table table name

This operation is equivalent to drop + create . First, delete the table, and then create a new table with the same name. Of course, you need to save a copy of the old table's data before executing the truncate table command. After the command is executed, import the data into the new table.

  • alter table table name engine = InnoDB

This operation is to traverse the data page of the primary key index of the old table, generate a B+ tree structure for the records in the data page, and store it in a temporary file on the disk. After the data page traversal is completed, the temporary file is used to replace the data file of the old table.

Starting from MySQL 5.6, this operation is Online DDL. It should be noted that this method requires scanning the table data file, which is very time-consuming for large tables. If it is for online services, it is necessary to avoid business peak hours and operate with caution.

Notice:
When rebuilding a table, InnoDB will not occupy the entire table. It reserves about 10% of the data nodes on each page for subsequent updates. In other words, the table is not the most compact after the rebuild.

Suppose there is such a process: rebuild table t once,
Insert some data, but the inserted data uses up part of the reserved space.
In this case, if you rebuild table t again, the space occupied by the rebuilt table may be larger than that before the reconstruction.

summary

This article starts from a practical problem, reproducing the problem, analyzing the problem, and solving the problem. Each step is analyzed in detail. Due to the limited space, some details are not in-depth, and readers need to understand them on their own

This concludes this article on the correct approach to MySQL tablespace recovery. For more relevant MySQL tablespace recovery content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Mysql online recovery of undo table space actual combat record

<<:  Detailed explanation of the use of Vue's new built-in components

>>:  Limit input type (multiple methods)

Recommend

Mysql tree-structured database table design

Table of contents Preface 1. Basic Data 2. Inheri...

Graphic tutorial on installing tomcat8 on centos7.X Linux system

1. Create the tomcat installation path mkdir /usr...

A detailed introduction to Linux system configuration (service control)

Table of contents Preface 1. System Service Contr...

JavaScript implements asynchronous acquisition of form data

This article example shares the specific code for...

A brief talk about JavaScript parasitic composition inheritance

Composition inheritance Combination inheritance i...

React sample code to implement automatic browser refresh

Table of contents What is front-end routing? How ...

Basic usage tutorial of IPTABLES firewall in LINUX

Preface For production VPS with public IP, only t...

Analysis of Nginx Rewrite usage scenarios and configuration methods

Nginx Rewrite usage scenarios 1. URL address jump...

MySQL 5.7.18 installation tutorial and problem summary

MySQL 5.7.18 installation and problem summary. I ...

How to align text boxes in multiple forms in HTML

The form code is as shown in the figure. The styl...

Some tips on deep optimization to improve website access speed

<br />The website access speed can directly ...

Introduction to 10 online development tools for web design

1. Online Text Generator BlindTextGenerator: For ...

Summary of JS tips for creating or filling arrays of arbitrary length

Table of contents Preface Direct filling method f...

How to publish static resources in nginx

step Place the prepared static resource files in ...