Why MySQL does not recommend deleting data

Preface

InnoDB storage architecture

Innodb tablespace

Inndo storage distribution

Create an empty table to view space changes
Spatial changes after inserting data
Space changes after deleting data

Fragmentation in Innodb

The generation of fragments
Recycling of debris

The impact of delete on SQL

SQL execution status before deletion
SQL execution after deletion

deleteOptimization suggestions

Control business account permissions
delete to mark deletion

Data archiving method

Common data archiving methods
Optimized archiving method

Summarize

Preface

I was responsible for several systems that had a growing business volume and the amount of data stored in MySQL was increasing dramatically. I thought at the time that the business side was being unethical and was launching a sneak attack, taking advantage of my inability to react and quickly increasing the size of many tables to the billion level. I was careless and did not dodge, which resulted in the SQL for the tables joined with them becoming very slow and the response time for the application interface also lengthened, affecting the user experience.

Afterwards, I found the business party, criticized them and told them to abide by the ethics of the society. They quickly apologized to me and the matter was dropped. When I left, I told them not to do this again next time and to reflect on their mistakes.

No matter how much you curse, the problem still needs to be solved. When I analyzed the reasons, I found that the data volume of some tables was growing very fast, and the corresponding SQL scanned a lot of invalid data, causing the SQL to slow down. After confirmation, these large tables were all flow, record, and log type data, which only needed to be retained for 1 to 3 months. At this time, it was necessary to clean up the table data to achieve slimming. Generally, people would think of using the insert + delete method to clean up.

In this article, I will explain why deleting data is not recommended from the aspects of InnoDB storage space distribution, the impact of delete on performance, and optimization suggestions.

InnoDB storage architecture

As can be seen from this picture, the InnoDB storage structure mainly consists of two parts: logical storage structure and physical storage structure.

Logically, it is composed of tablespace -> segment or inode -> extent -> data page. The logical management unit of Innodb is segment, and the smallest unit of space allocation is extent. Each segment will be allocated 32 pages from the tablespace FREE_PAGE. When these 32 pages are not enough, they will be expanded according to the following principles: if the current extent is less than 1, it will be expanded to 1 extent; when the tablespace is less than 32MB, one extent will be expanded at a time; if the tablespace is larger than 32MB, 4 extents will be expanded at a time.

Physically, it is mainly composed of system user data files and log files. The data files mainly store MySQL dictionary data and user data. The log files record the changes of data pages and are used for recovery when MySQL crashes.

Innodb tablespace

InnoDB storage includes three types of tablespaces: system tablespace, user tablespace, and Undo tablespace.

**System tablespace: **Mainly stores MySQL internal data dictionary data, such as data under information_schema.

**User tablespace: **When innodb_file_per_table=1 is enabled, the data table is separated from the system tablespace and stored in a data file named table_name.ibd, and the structure information is stored in the table_name.frm file.

**Undo tablespace:**Stores undo information. For example, snapshot consistent read and flashback both use undo information.

Starting from MySQL 8.0, users are allowed to define tablespaces. The specific syntax is as follows:

CREATE TABLESPACE tablespace_name
  ADD DATAFILE 'file_name' #data file name USE LOGFILE GROUP logfile_group #custom log file group, usually 2 logfiles per group.
  [EXTENT_SIZE [=] extent_size] #Extent size[INITIAL_SIZE [=] initial_size] #Initialization size[AUTOEXTEND_SIZE [=] autoextend_size] #Automatically expand size[MAX_SIZE [=] max_size] #Maximum size of a single file, the maximum is 32G.
  [NODEGROUP [=] nodegroup_id] #Node group [WAIT]
  [COMMENT [=] comment_text]
  ENGINE[=] engine_name

The advantage of this is that you can separate the hot and cold data and use HDD and SSD to store them respectively, which can not only achieve efficient data access but also save costs. For example, you can add two 500G hard drives, create a volume group vg, divide the logical volume lv, create a data directory and mount the corresponding lv. Assume that the two divided directories are /hot_data and /cold_data.

In this way, core business tables such as user tables and order tables can be stored on high-performance SSD disks, and some logs and flow tables can be stored on ordinary HDDs. The main operation steps are as follows:

#Create hot data tablespace create tablespace tbs_data_hot add datafile '/hot_data/tbs_data_hot01.dbf' max_size 20G;
#Create the core business table and store it in the hot data tablespace create table booking(id bigint not null primary key auto_increment, …… ) tablespace tbs_data_hot;
#Create cold data tablespace create tablespace tbs_data_cold add datafile '/hot_data/tbs_data_cold01.dbf' max_size 20G;
#Create log, journal, and backup tables and store them in the cold data tablespace create table payment_log(id bigint not null primary key auto_increment, …… ) tablespace tbs_data_cold;
#You can move the table to another tablespace alter table payment_log tablespace tbs_data_hot;

Inndo storage distribution

Create an empty table to view space changes

mysql> create table user(id bigint not null primary key auto_increment, 
  -> name varchar(20) not null default '' comment 'Name', 
  -> age tinyint not null default 0 comment 'age', 
  -> gender char(1) not null default 'M' comment 'Gender',
  -> phone varchar(16) not null default '' comment 'Phone number',
  -> create_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'creation time',
  -> update_time datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'modification time'
  -> ) engine = InnoDB DEFAULT CHARSET=utf8mb4 COMMENT 'User information table';
Query OK, 0 rows affected (0.26 sec)

# ls -lh user1.ibd 
-rw-r----- 1 mysql mysql 96K Nov 6 12:48 user.ibd

When the parameter innodb_file_per_table is set to 1, a segment is automatically created when a table is created, and an extent is allocated, which contains 32 data pages to store data. The default size of the empty table created in this way is 96KB. After the extent is used up, 64 connection pages will be requested. In this way, for some small tables or undo segments, less space can be requested at the beginning to save disk capacity overhead.

# python2 py_innodb_page_info.py -v /data2/mysql/test/user.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0000>
page offset 00000000, page type <Freshly Allocated Page>
page offset 00000000, page type <Freshly Allocated Page>
Total number of page: 6: #Total number of allocated pagesFreshly Allocated Page: 2 #Available data pageInsert Buffer Bitmap: 1 #Insert buffer pageFile Space Header: 1 #File space headerB-tree Node: 1 #Data pageFile Segment inode: 1 #File side inonde, if it is on ibdata1.ibd, there will be multiple inodes.

Spatial changes after inserting data

mysql> DELIMITER $$
mysql> CREATE PROCEDURE insert_user_data(num INTEGER) 
  -> BEGIN
  -> DECLARE v_i int unsigned DEFAULT 0;
  ->set autocommit= 0;
  -> WHILE v_i < num DO
  -> insert into user(`name`, age, gender, phone) values (CONCAT('lyn',v_i), mod(v_i,120), 'M', CONCAT('152',ROUND(RAND(1)*100000000)));
  -> SET v_i = v_i+1;
  -> END WHILE;
  -> commit;
  -> END $$
Query OK, 0 rows affected (0.01 sec)
mysql> DELIMITER ;

#Insert 100,000 datamysql> call insert_user_data(100000);
Query OK, 0 rows affected (6.69 sec)

# ls -lh user.ibd
-rw-r----- 1 mysql mysql 14M Nov 6 10:58 /data2/mysql/test/user.ibd

# python2 py_innodb_page_info.py -v /data2/mysql/test/user.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001> #A non-leaf node is added, and the height of the tree changes from 1 to 2.
........................................................
page offset 00000000, page type <Freshly Allocated Page>
Total number of pages: 896:
Freshly Allocated Page: 493
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 400
File Segment inode: 1

Space changes after deleting data

mysql> select min(id),max(id),count(*) from user;
+---------+---------+----------+
| min(id) | max(id) | count(*) |
+---------+---------+----------+
| 1 | 100000 | 100000 |
+---------+---------+----------+
1 row in set (0.05 sec)
#Delete 50,000 data items. Theoretically, the space should increase from 14MB to about 7MB.
mysql> delete from user limit 50000;
Query OK, 50000 rows affected (0.25 sec)

 
#The data file size is still 14MB and has not shrunk.
# ls -lh /data2/mysql/test/user1.ibd 
-rw-r----- 1 mysql mysql 14M Nov 6 13:22 /data2/mysql/test/user.ibd

#Data pages are not recycled.
# python2 py_innodb_page_info.py -v /data2/mysql/test/user.ibd
page offset 00000000, page type <File Space Header>
page offset 00000001, page type <Insert Buffer Bitmap>
page offset 00000002, page type <File Segment inode>
page offset 00000003, page type <B-tree Node>, page level <0001>
........................................................
page offset 00000000, page type <Freshly Allocated Page>
Total number of pages: 896:
Freshly Allocated Page: 493
Insert Buffer Bitmap: 1
File Space Header: 1
B-tree Node: 400
File Segment inode: 1
#Inside MySQL, it is marked for deletion.

mysql> use information_schema;

Database changed
mysql> SELECT A.SPACE AS TBL_SPACEID, A.TABLE_ID, A.NAME AS TABLE_NAME, FILE_FORMAT, ROW_FORMAT, SPACE_TYPE, B.INDEX_ID , B.NAME AS INDEX_NAME, PAGE_NO, B.TYPE AS INDEX_TYPE FROM INNODB_SYS_TABLES A LEFT JOIN INNODB_SYS_INDEXES B ON A.TABLE_ID = B.TABLE_ID WHERE A.NAME = 'test/user1';
+-------------+----------+------------+-------------+------------+--------------+------------+------------+---------+------------+
| TBL_SPACEID | TABLE_ID | TABLE_NAME | FILE_FORMAT | ROW_FORMAT | SPACE_TYPE | INDEX_ID | INDEX_NAME | PAGE_NO | INDEX_TYPE |
+-------------+----------+------------+-------------+------------+--------------+------------+------------+---------+------------+
| 1283 | 1207 | test/user | Barracuda | Dynamic | Single | 2236 | PRIMARY | 3 | 3 |
+-------------+----------+------------+-------------+------------+--------------+------------+------------+---------+------------+
1 row in set (0.01 sec)

PAGE_NO = 3 indicates that the root page of the B-tree is page 3, and INDEX_TYPE = 3 is a clustered index. INDEX_TYPE values are as follows:
0 = nonunique secondary index; 
1 = automatically generated clustered index (GEN_CLUST_INDEX); 
2 = unique nonclustered index; 
3 = clustered index; 
32 = full-text index;
#Shrink the space and observe later

MySQL will not actually delete the space internally, but mark it for deletion, that is, change delflag:N to delflag:Y. After commit, it will be purged into the delete list. If a larger record is inserted next time, the space after delete will not be reused. If the inserted record is less than or equal to the deleted record, the space will be reused. This content can be analyzed through Zhishutang's innblock tool.

Fragmentation in Innodb

The generation of fragments

We know that data stored on the file system cannot always utilize 100% of the physical space allocated to it. Deleting data will leave some "holes" on the page, or random writing (non-linear increase of clustered index) will cause page splits. Page splits cause the page utilization space to be less than 50%. In addition, adding, deleting and modifying the table will cause random additions, deletions and modifications to the corresponding secondary index values, which will also cause some "holes" on the data pages in the index structure. Although these holes may be reused, they will eventually cause some physical space to be unused, which is fragmentation.

At the same time, even if the fill factor is set to 100%, Innodb will proactively leave 1/16 of the space in clustered index pages free for future index growth to prevent row overflow caused by updates.

mysql> select table_schema,
  -> table_name,ENGINE,
  -> round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb,TABLE_ROWS,
  -> round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio
  -> from information_schema.TABLES where TABLE_SCHEMA= 'test'
  -> and TABLE_NAME = 'user';
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
| table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio |
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
| test | user | InnoDB | 4 | 50000 | 4 | 0 | 6 | 149.42 |
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
1 row in set (0.00 sec)

Data_free is the number of bytes allocated that are not in use, and does not indicate that the space is completely fragmented.

Recycling of debris

For InnoDB tables, you can use the following command to recycle fragments and free up space. This is a random read IO operation, which is time-consuming and will block normal DML operations on the table. It also requires more disk space. For RDS, it may cause the disk space to fill up instantly, the instance to be locked instantly, and the application to be unable to perform DML operations. Therefore, it is forbidden to execute it in an online environment.

#Execute InnoDB fragment recoverymysql> alter table user engine=InnoDB;
Query OK, 0 rows affected (9.00 sec)
Records: 0 Duplicates: 0 Warnings: 0

##After execution, the data file size is reduced from 14MB to 10M.
# ls -lh /data2/mysql/test/user1.ibd 
-rw-r----- 1 mysql mysql 10M Nov 6 16:18 /data2/mysql/test/user.ibd

mysql> select table_schema, table_name,ENGINE, round(DATA_LENGTH/1024/1024+ INDEX_LENGTH/1024/1024) total_mb,TABLE_ROWS, round(DATA_LENGTH/1024/1024) data_mb, round(INDEX_LENGTH/1024/1024) index_mb, round(DATA_FREE/1024/1024) free_mb, round(DATA_FREE/DATA_LENGTH*100,2) free_ratio from information_schema.TABLES where TABLE_SCHEMA= 'test' and TABLE_NAME= 'user';
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
| table_schema | table_name | ENGINE | total_mb | TABLE_ROWS | data_mb | index_mb | free_mb | free_ratio |
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
| test | user | InnoDB | 5 | 50000 | 5 | 0 | 2 | 44.29 |
+--------------+------------+--------+----------+------------+---------+----------+---------+------------+
1 row in set (0.00 sec)

The impact of delete on SQL

SQL execution status before deletion

#Insert 1 million datamysql> call insert_user_data(1000000);
Query OK, 0 rows affected (35.99 sec)

#Add related indexesmysql> alter table user add index idx_name(name), add index idx_phone(phone);
Query OK, 0 rows affected (6.00 sec)
Records: 0 Duplicates: 0 Warnings: 0

#Index statistics on the tablemysql> show index from user;
+-------+------------+-----------+--------------+--------------+---------------+-------------+------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+--------------+---------------+-------------+------+--------+------+------------+---------+---------------+
| user | 0 | PRIMARY | 1 | id | A | 996757 | NULL | NULL | | BTREE | | |
| user | 1 | idx_name | 1 | name | A | 996757 | NULL | NULL | | BTREE | | |
| user | 1 | idx_phone | 1 | phone | A | 2 | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+--------------+---------------+-------------+------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

#Reset status variable countmysql> flush status;
Query OK, 0 rows affected (0.00 sec)

#Execute SQL statementmysql> select id, age, phone from user where name like 'lyn12%';
+--------+-----+-------------+
| id | age | phone |
+--------+-----+-------------+
| 124 | 3 | 15240540354 |
| 1231 | 30 | 15240540354 |
| 12301 | 60 | 15240540354 |
.............................
| 129998 | 37 | 15240540354 |
| 129999 | 38 | 15240540354 |
| 130000 | 39 | 15240540354 |
+--------+-----+-------------+
11111 rows in set (0.03 sec)

mysql> explain select id, age ,phone from user where name like 'lyn12%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
| 1 | SIMPLE | user | range | idx_name | idx_name | 82 | NULL | 22226 | Using index condition |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
1 row in set (0.00 sec)

#View the relevant status variablesmysql> select * from information_schema.session_status where variable_name in('Last_query_cost','Handler_read_next','Innodb_pages_read','Innodb_data_reads','Innodb_pages_read');
+-------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------+----------------+
| HANDLER_READ_NEXT | 11111 | #Number of rows requested to be read | INNODB_DATA_READS | 7868409 | #Total number of physical data reads | INNODB_PAGES_READ | 7855239 | #Total number of logical reads | LAST_QUERY_COST | 10.499000 | #Cost of SQL statements, mainly including IO_COST and CPU_COST.
+-------------------+----------------+
4 rows in set (0.00 sec)

SQL execution after deletion

#Delete 500,000 datamysql> delete from user limit 500000;
Query OK, 500000 rows affected (3.70 sec)

#Analyze table statistics mysql> analyze table user;
+-----------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-----------+---------+----------+----------+
| test.user | analyze | status | OK |
+-----------+---------+----------+----------+
1 row in set (0.01 sec)

#Reset status variable countmysql> flush status;
Query OK, 0 rows affected (0.01 sec)

mysql> select id, age ,phone from user where name like 'lyn12%';
Empty set (0.05 sec)

mysql> explain select id, age ,phone from user where name like 'lyn12%';
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
| 1 | SIMPLE | user | range | idx_name | idx_name | 82 | NULL | 22226 | Using index condition |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+-----------------------+
1 row in set (0.00 sec)

mysql> select * from information_schema.session_status where variable_name in('Last_query_cost','Handler_read_next','Innodb_pages_read','Innodb_data_reads','Innodb_pages_read');
+-------------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-------------------+----------------+
| HANDLER_READ_NEXT | 0 |
| INNODB_DATA_READS | 7868409 |
| INNODB_PAGES_READ | 7855239 |
| LAST_QUERY_COST | 10.499000 |
+-------------------+----------------+
4 rows in set (0.00 sec)

Statistical analysis of results

operate	COST	Physical read times	Logical read times	Scan lines	Returns the number of rows	Execution time
Initialization plug 100W	10.499000	7868409	7855239	22226	11111	30ms
100W randomly delete 50W	10.499000	7868409	7855239	22226	0	50ms

This also shows that for ordinary large tables, it is unrealistic to slim down the table by deleting data, so do not use delete to delete data at any time, and use elegant mark deletion.

deleteOptimization suggestions

Control business account permissions

For a large system, it is necessary to split subsystems according to business characteristics. Each subsystem can be regarded as a service. For example, the Meituan APP has many services. The core services include user-service, search-service, product-service, location-service, price-service, etc. Each service corresponds to a database. A separate account is created for the database. Only DML permissions are granted and no delete permissions are granted. Cross-database access is prohibited.

#Create a user database and authorize create database mt_user charset utf8mb4;
grant USAGE, SELECT, INSERT, UPDATE ON mt_user.* to 'w_user'@'%' identified by 't$W*g@gaHTGi123456';
flush privileges;

delete to mark deletion

There are four common fields in the MySQL database modeling specification, which are basically required for every table. At the same time, creating an index on the create_time column has two benefits:

Some query business scenarios have a default time period, such as 7 days or a month, which are filtered by create_time, and index scanning is faster.
Some core business tables need to be extracted from the data warehouse in a T + 1 manner. For example, the data of the previous day needs to be extracted at 00:30 every night, all of which are filtered by create_time.

`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Primary key id',
`is_deleted` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether logically deleted: 0: not deleted, 1: deleted',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
`update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'modification time'

#With the delete marker, the delete operation of the business interface can be converted to update
update user set is_deleted = 1 where user_id = 1213;

#When querying, you need to bring is_deleted filter select id, age, phone from user where is_deleted = 0 and name like 'lyn12%';

Data archiving method

Common data archiving methods

#1. Create an archive table, usually by adding _bak after the original table name.
CREATE TABLE `ota_order_bak` (
 `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
 `order_id` varchar(255) DEFAULT NULL COMMENT 'Order id',
 `ota_id` varchar(255) DEFAULT NULL COMMENT 'ota',
 `check_in_date` varchar(255) DEFAULT NULL COMMENT 'Check-in date',
 `check_out_date` varchar(255) DEFAULT NULL COMMENT 'Check-out date',
 `hotel_id` varchar(255) DEFAULT NULL COMMENT 'Hotel ID',
 `guest_name` varchar(255) DEFAULT NULL COMMENT 'Customer',
 `purcharse_time` timestamp NULL DEFAULT NULL COMMENT 'Purchase time',
 `create_time` datetime DEFAULT NULL,
 `update_time` datetime DEFAULT NULL,
 `create_user` varchar(255) DEFAULT NULL,
 `update_user` varchar(255) DEFAULT NULL,
 `status` int(4) DEFAULT '1' COMMENT 'Status: 1 normal, 0 deleted',
 `hotel_name` varchar(255) DEFAULT NULL,
 `price` decimal(10,0) DEFAULT NULL,
 `remark` longtext,
 PRIMARY KEY (`id`),
 KEY `IDX_order_id` (`order_id`) USING BTREE,
 KEY `hotel_name` (`hotel_name`) USING BTREE,
 KEY `ota_id` (`ota_id`) USING BTREE,
 KEY `IDX_purcharse_time` (`purcharse_time`) USING BTREE,
 KEY `IDX_create_time` (`create_time`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE (to_days(create_time)) ( 
PARTITION p201808 VALUES LESS THAN (to_days('2018-09-01')), 
PARTITION p201809 VALUES LESS THAN (to_days('2018-10-01')), 
PARTITION p201810 VALUES LESS THAN (to_days('2018-11-01')), 
PARTITION p201811 VALUES LESS THAN (to_days('2018-12-01')), 
PARTITION p201812 VALUES LESS THAN (to_days('2019-01-01')), 
PARTITION p201901 VALUES LESS THAN (to_days('2019-02-01')), 
PARTITION p201902 VALUES LESS THAN (to_days('2019-03-01')), 
PARTITION p201903 VALUES LESS THAN (to_days('2019-04-01')), 
PARTITION p201904 VALUES LESS THAN (to_days('2019-05-01')), 
PARTITION p201905 VALUES LESS THAN (to_days('2019-06-01')), 
PARTITION p201906 VALUES LESS THAN (to_days('2019-07-01')), 
PARTITION p201907 VALUES LESS THAN (to_days('2019-08-01')), 
PARTITION p201908 VALUES LESS THAN (to_days('2019-09-01')), 
PARTITION p201909 VALUES LESS THAN (to_days('2019-10-01')), 
PARTITION p201910 VALUES LESS THAN (to_days('2019-11-01')), 
PARTITION p201911 VALUES LESS THAN (to_days('2019-12-01')), 
PARTITION p201912 VALUES LESS THAN (to_days('2020-01-01')));

#2. Insert invalid data into the original table (need to confirm the data retention range with the developer)
create table tbl_p201808 as select * from ota_order where create_time between '2018-08-01 00:00:00' and '2018-08-31 23:59:59';

#3. Exchange the partition with the archive table partition alter table ota_order_bak exchange partition p201808 with table tbl_p201808;

#4. Delete the standardized data in the original table delete from ota_order where create_time between '2018-08-01 00:00:00' and '2018-08-31 23:59:59' limit 3000;

Optimized archiving method

#1. Create an intermediate table CREATE TABLE `ota_order_2020` (........) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE (to_days(create_time)) ( 
PARTITION p201808 VALUES LESS THAN (to_days('2018-09-01')), 
PARTITION p201809 VALUES LESS THAN (to_days('2018-10-01')), 
PARTITION p201810 VALUES LESS THAN (to_days('2018-11-01')), 
PARTITION p201811 VALUES LESS THAN (to_days('2018-12-01')), 
PARTITION p201812 VALUES LESS THAN (to_days('2019-01-01')), 
PARTITION p201901 VALUES LESS THAN (to_days('2019-02-01')), 
PARTITION p201902 VALUES LESS THAN (to_days('2019-03-01')), 
PARTITION p201903 VALUES LESS THAN (to_days('2019-04-01')), 
PARTITION p201904 VALUES LESS THAN (to_days('2019-05-01')), 
PARTITION p201905 VALUES LESS THAN (to_days('2019-06-01')), 
PARTITION p201906 VALUES LESS THAN (to_days('2019-07-01')), 
PARTITION p201907 VALUES LESS THAN (to_days('2019-08-01')), 
PARTITION p201908 VALUES LESS THAN (to_days('2019-09-01')), 
PARTITION p201909 VALUES LESS THAN (to_days('2019-10-01')), 
PARTITION p201910 VALUES LESS THAN (to_days('2019-11-01')), 
PARTITION p201911 VALUES LESS THAN (to_days('2019-12-01')), 
PARTITION p201912 VALUES LESS THAN (to_days('2020-01-01')));

#2. Insert valid data into the original table. If the data volume is around 1 million, you can directly insert it during the off-peak period. If it is relatively large, it is recommended to use dataX, which can control the frequency and size. I have previously encapsulated dataX with Go to automatically generate json files and execute them with custom sizes.
insert into ota_order_2020 select * from ota_order where create_time between '2020-08-01 00:00:00' and '2020-08-31 23:59:59';

#3. Table rename alter table ota_order rename to ota_order_bak; 
alter table ota_order_2020 rename to ota_order;
#4. Insert difference datainsert into ota_order select * from ota_order_bak a where not exists (select 1 from ota_order b where a.id = b.id);
#5. Transform ota_order_bak into a partitioned table. If the table is large, it is not recommended to transform it directly. You can create a partitioned table first and import it into it through dataX.

#6. Subsequent archiving method #Create an intermediate general table create table ota_order_mid like ota_order;
#Exchange the invalid data partition of the original table to the normal table alter table ota_order exchange partition p201808 with table ota_order_mid; 
##Exchange the data of the common table to the corresponding partition of the archive table alter table ota_order_bak exchange partition p201808 with table ota_order_mid;

In this way, both the original table and the archive table are partitioned by month. You only need to create an intermediate ordinary table and perform two partition exchanges during the off-peak period of business. This can not only delete invalid data, but also recycle space. There is no space fragmentation and it will not affect the index on the table and the SQL execution plan.

Summarize

From the distribution of InnoDB storage space and the impact of delete on performance, we can see that physical deletion can neither release disk space nor generate a large amount of fragmentation, causing frequent index splitting and affecting the stability of SQL execution plans.

At the same time, when the fragments are recovered, a large amount of CPU and disk space will be consumed, affecting the normal DML operations on the table.

At the business code level, logical marking should be done to avoid physical deletion. To meet data archiving requirements, the MySQL partition table feature can be used, which is all DDL operations and no fragmentation is generated.

Another better solution is to use Clickhouse. Clickhouse can be used to store data tables with a life cycle, and its TTL feature can be used to automatically clean up invalid data.

This concludes this article on why MySQL does not recommend deleting data. For more information about why MySQL does not recommend deleting data, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

Two ways to prevent MySQL from deleting data using the delete command
Comparison of MySQL Delete and Truncate statements
How to release disk space after deleting data in MySQL

<<: HTML table tag tutorial (34): row span attribute ROWSPAN

>>: Using zabbix to monitor the ogg process (Linux platform)

In-depth explanation of Set and WeakSet collections in ES6

Recommend

Solution to the problem that Alibaba Cloud host cannot access the website using IP (solved by configuring security group rules)

I just bought an Alibaba Cloud host and couldn’t ...

Table of contents

Preface

InnoDB storage architecture

Innodb tablespace

Inndo storage distribution

Create an empty table to view space changes

Spatial changes after inserting data

Space changes after deleting data

Fragmentation in Innodb

The generation of fragments

Recycling of debris

The impact of delete on SQL

SQL execution status before deletion

SQL execution after deletion

deleteOptimization suggestions

Control business account permissions

delete to mark deletion

Data archiving method

Common data archiving methods

Optimized archiving method

Summarize

Recommend