A brief discussion on how to elegantly delete large tables in MySQL

A brief discussion on how to elegantly delete large tables in MySQL

As time goes by or the business volume grows, the database space utilization rate continues to rise steadily. When the database space is about to reach a bottleneck, we may find that the database has one or two super large tables! They have accumulated all the data from the beginning of the business to the present, but 90% of the data has no business value. How should they deal with these large tables at this time?

Since it is worthless data, we usually choose to delete it directly or delete it after archiving. The operation methods of data deletion can be divided into two categories:

  • Delete all data in the table directly through truncate
  • Delete the records that meet the conditions in the table through delete

1. Truncate operation

Logically speaking, the truncate operation deletes all rows in the table, but it is different from the delete from table_name where 1=1 operation. In order to improve the performance of deleting the entire table data, MySQL's truncate operation is actually to drop the table first and then re-create the table. For this reason, the truncate operation is a non-rollback DDL operation.

1.1 What operations does MySQL truncate perform?

  • The truncate operation is actually divided into two steps: drop and re-create
  • The first stage of the drop operation is to clear the buffer pool pages and delete the table-related data pages from the flush chain without the need for a flush operation. The bottleneck of this step is that the deletion operation of the flush queue must hold the lock of the corresponding buffer pool instance and perform a traversal search. If the buffer pool instance is large and there are many data pages to be deleted in the flush chain, this operation will cause other transactions to be blocked when acquiring the lock of the buffer pool instance, thereby affecting the performance of the database.
  • The second stage of the drop operation is the process of deleting the ibd disk file. The larger the physical file of the database, the greater the I/O resource consumption and the longer the deletion operation takes.
  • During the re-create operation phase, as long as the .frm file of the deleted table is intact, the table can be rebuilt according to the original table structure information after the drop table is dropped. The auto_increment value of the rebuilt table will be reset.

1.2 How to optimize the resource consumption caused by truncate operation?

  • For the first stage of the drop table in the truncate operation, when the innodb_buffer_pool_size allocated to the MySQL instance exceeds 1GB, set the innodb_buffer_pool_instances parameter appropriately to improve concurrency while also reducing the time spent on locking resources when scanning the buffer pool instance.
  • For the second stage of the drop table in the truncate operation, before deleting the corresponding table, a hard link is created for the .ibd file of the changed table to speed up the execution efficiency of the drop operation at the MySQL level and reduce performance loss at the database level. Then manually clean up the hard links we made at the operating system level

2. Delete operation

2.1 What operations does MySQL delete perform?

  • Perform an index/full table scan on the deletion table based on the where condition to check whether the where condition is met. In this stage, all rows in the scan will be locked. This stage is the biggest hidden danger of resource consumption. If the table has a large amount of data and the delete operation cannot effectively use the index to reduce the amount of scanned data, this step will cause huge lock contention and CPU/IO resource consumption for the database.
  • The locks applied to rows that cannot be matched by the where condition are released after the condition is checked, and InnoDB only locks the rows that need to be deleted. This can effectively reduce lock contention, but one thing we still need to pay attention to is that deleting a large amount of data at one time will generate a huge binlog transaction log, which is not friendly to MySQL itself and the slave database in the master-slave architecture, and may cause a long replication delay.

2.2 How to optimize delete operations?

  • Delete all table operations need to be cautious, you can consider using the truncate operation
  • In delete ... where ..., the where filter condition should ensure that the index can be effectively used to reduce the amount of data scanning and avoid full table scanning.
  • For large-scale data deletion and where the conditions have no index, the delete operation can add an additional auto-increment primary key or an indexed time field to perform batch deletion operations, deleting a small amount of data each time and executing in multiple batches.
  • For the classic scenario of retaining recent data and deleting historical data, you can create a xxx_tmp table with the same structure and use the insert xxx_tmp select ... operation to retain the required data in the tmp table. Then, use the rename operation to replace the current business table xxx with the xxx_bak table, and replace the xxx_tmp table with the current business table name xxx. Then manually delete the useless large table xxx_bak.

2.3 Two common scenarios of delete

2.3.1 delete where condition has no valid index filtering

A common scenario is that the business needs to delete the value of t1 condition1=xxx. The condition field cannot effectively use the index. In this case, we usually do the following:

  • Check the indexes that can be effectively used in the current table structure, and try to use the table's self-incrementing primary key or time index field
  • Effectively utilize the self-incrementing primary key index or time index, add range filtering of the index field to the delete operation, delete a small amount of data each time, and execute in multiple batches. The specific batching needs to be evaluated based on the actual business situation to avoid deleting large amounts of data at one time.
-- Use the self-incrementing primary key index to delete from t1 where condition1=xxx and id >=1 and id < 50000;
delete from t1 where condition1=xxx and id >=50000 and id < 100000;

-- Use time index to delete from t1 where condition1=xxx and create_time >= '2021-01-01 00:00:00' and create_time < '2021-02-01 00:00:00';
delete from t1 where condition1=xxx and create_time >= '2021-02-01 00:00:00' and create_time < '2021-03-01 00:00:00';

2.3.2 Keep recent data and delete historical data

A common scenario is that you need to keep only the data of the t1 table in the past three months and delete the rest of the historical data. Our usual approach is:

Create a t1_tmp table to temporarily store data that needs to be retained

create table t1_tmp like t1;

According to the indexed time field, write the data to be retained into the t1_tmp table in batches. It should be noted that the operation of the last batch of time can be temporarily ignored.

-- Divide the data into batches according to the number of instance businesses, and try not to process too much data in each batch insert into t1_tmp select * from t1 where create_time >= '2021-01-01 00:00:00' and create_time < '2021-02-01 00:00:00';
insert into t1_tmp select * from t1 where create_time >= '2021-02-01 00:00:00' and create_time < '2021-03-01 00:00:00';

-- The last batch of data will not be operated yet -- insert into t1_tmp select * from t1 where create_time >= '2021-03-01 00:00:00' and create_time < '2021-04-01 00:00:00';

Use the rename operation to replace the current business table t1 with the t1_bak table, and the t1_tmp table with the current business table name t1. If there are frequent DML operations on the deleted table, this step will cause a short-term business access failure.

alter table t1 rename to t1_bak;
alter table t1_tmp rename to t1;

Write the last batch of data into the current business table. The purpose of this step is to reduce data loss during the change operation process.

insert into t1 select * from t1_bak where create_time >= '2021-03-01 00:00:00' and create_time < '2021-04-01 00:00:00';

In the rename operation step, another point we need to pay attention to is whether the primary key of the change table is self-incrementing or the business-unique uuid. If it is a self-incrementing primary key, we also need to pay attention to modifying the self-increment value of the t1_tmp table to ensure that the final setting value includes the data written during the change period.

alter table t1_tmp auto_increment={current auto value of t1 table}+{estimated growth value during the change period}

III. Comparison of the advantages and disadvantages of Truncate/Delete

Operation Type describe Advantages Disadvantages
Truncate Delete all tables No need to scan table data, high execution efficiency, direct physical deletion, quick release of space occupation DDL operations cannot be rolled back and cannot be deleted according to conditions
Delete Filter and delete operations based on specified conditions Can filter and delete according to specified conditions The deletion efficiency depends on the writing of the where condition. Deleting a large table will produce a large amount of binlogs and the deletion efficiency is low. The deletion operation may result in more fragmented space instead of directly releasing the space occupied.

This is the end of this article on how to elegantly delete large tables in MySQL. For more information about deleting large tables in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to implement batch deletion of large amounts of data in MySQL large tables
  • Detailed explanation of how to gracefully delete a large table in MySQL
  • Example of how to quickly delete a 2T table in mysql in Innodb
  • Solution to MySQL performance problem of deleting large tables

<<:  IE6 BUG and fix is ​​a preventive strategy

>>:  Web designer is a suitable talent

Recommend

Vue implements the browser-side code scanning function

background Not long ago, I made a function about ...

Implementation of multi-environment configuration (.env) of vue project

Table of contents What is multi-environment confi...

The complete implementation process of Sudoku using JavaScript

Table of contents Preface How to solve Sudoku Fil...

Detailed steps to delete environment variables in Linux

How to delete environment variables in Linux? Use...

mysql splits a row of data into multiple rows based on commas

Table of contents Separation effect Command line ...

Detailed explanation of MySQL Explain

In daily work, we sometimes run slow queries to r...

How to change the terminal to a beautiful command line prompt in Ubuntu 18

I reinstalled VMware and Ubuntu, but the command ...

MySQL 5.5.27 installation graphic tutorial

1. Installation of MYSQL 1. Open the downloaded M...

js to achieve image fade-in and fade-out effect

This article shares the specific code of js to ac...

Detailed troubleshooting of docker.service startup errors

Execute the following command to report an error ...

MySQL data duplicate checking and deduplication implementation statements

There is a table user, and the fields are id, nic...

The benefits and examples of placing the site map at the bottom of the web page

In the past, almost every website had a sitemap p...