Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL

Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL

The author recently encountered a performance bottleneck problem in his work. The MySQL table has approximately 7.76 million new records added every day, and the storage period is 7 days. Data older than 7 days needs to be aged before new records are added. After running for 9 consecutive days, it takes about 3.5 hours to delete one day's data (environment: 128G, 32 cores, 4T hard drive), which is unacceptable. Of course, if you want to delete the entire table, there is no doubt to use

Just TRUNCATE TABLE.

The initial solution (because it was not expected that the deletion would be so slow) is as follows (the simplest and most naive method):

delete from table_name where cnt_date <= target_date

After further research, we finally achieved the goal of deleting more than 7.7 million data records at a lightning speed (about 1 second). The total amount of data in a single table was around 46 million. The optimization process was carried out in layers, and the details are recorded as follows:

  • Batch delete (limit a certain number each time), and then delete in a loop until all data is deleted; at the same time, key_buffer_size is increased from the default 8M to 512M

Operation effect: The deletion time was increased from about 3.5 hours to 3 hours

(1) Use limit (the specific size should be set as appropriate) to limit the amount of data deleted at one time, and then determine whether the data has been deleted. The source code is as follows (Python implementation):

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
 except:
  mysqlconn.rollback()

(2) Increase key_buffer_size

mysqlcur.execute("SET GLOBAL key_buffer_size = 536870912")

key_buffer_size is a global variable. For details, see the official MySQL documentation: https://dev.mysql.com/doc/refman/5.7/en/server-configuration.html

  • DELETE QUICK + OPTIMIZETABLE

Applicable scenario: MyISAM Tables

Why: MyISAM maintains deleted data in a linked list, and the space and row positions will be reused by Insert data later. After a direct delete, MySQL will merge the index blocks, which involves copying and moving a large amount of memory; while OPTIMIZE TABLE directly rebuilds the index, that is, directly creates a new copy of the data block (think of the JVM garbage collection algorithm).

Operation effect: The deletion time was increased from 3.5 hours to 1 hour and 40 minutes

The specific code is as follows:

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE QUICK from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 optimize_sql = "OPTIMIZE TABLE g_visit_relation_asset"
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
  mysqlcur.execute(optimize_sql)
  mysqlconn.commit()
 except:
  mysqlconn.rollback()
  • Table partition, directly delete the partition with the expiration date (final solution - flash sale)

There are several ways to partition a MySQL table, including RANGE, KEY, LIST, and HASH. For details, see the official documentation. Because the date of the application scenario here is changing, it is not suitable to use RANGE to set a fixed partition name. HASH partition is more suitable for this scenario.

(1) Partition table definition, the SQL statement is as follows:

ALTER TABLE table_name PARTITION BY HASH(TO_DAYS(cnt_date)) PARTITIONS 7;

TO_DAYS converts the date (must be a date type, otherwise an error will be reported: Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed) into days (the total number of days in the year, month, and day), and then HASH; creates 7 partitions. Actually, it’s days MOD 7.

(2) Query the partition where the date to be aged is located. The SQL statement is as follows:

"explain partitions select * from g_visit_relation_asset where cnt_date = '%s'" % expired_day

The execution results are as follows (the partitions column indicates the partition):

+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | table_name | p1 | ALL | cnt_date_index | NULL | NULL | NULL | 1325238 | 100.00 | Using where |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

(3) OPTIMIZE or REBUILD partition, the SQL statement is as follows:

"ALTER TABLE g_visit_relation_asset OPTIMIZE PARTITION '%s'" % partition

The complete code is as follows [Python implementation], which loops and deletes data that is less than the specified date:

def clear_partition_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 expired_day = day
 query_partition_sql = "explain partitions select * from table_name where cnt_date = '%s'" % expired_day
 # OPTIMIZE or REBUILD after truncate partition
 try: 
  while True:
   df = pd.read_sql(query_partition_sql, mysqlconn)
   if df is None or df.empty:
    break
   partition = df.loc[0, 'partitions']
   if partition is not None:
    clear_partition_sql = "alter table table_name TRUNCATE PARTITION %s" % partition
    mysqlcur.execute(clear_partition_sql)
    mysqlconn.commit()

    optimize_partition_sql = "ALTER TABLE table_name OPTIMIZE PARTITION %s" % partition
    mysqlcur.execute(optimize_partition_sql)
    mysqlconn.commit()
   
   expired_day = (expired_day - timedelta(days = 1)).strftime("%Y-%m-%d")
   df = pd.read_sql(query_partition_sql, mysqlconn)
 except:
  mysqlconn.rollback()
  • other

If the deleted data exceeds 50% of the table data, it is recommended to copy the required data to a temporary table, then delete the original table, and then rename the temporary table to the original table. MySQL is as follows:

 INSERT INTO New
  SELECT * FROM Main
   WHERE ...; -- just the rows you want to keep
 RENAME TABLE main TO Old, New TO Main;
 DROP TABLE Old; -- Space freed up here

You can delete partitions by: ALTER TABLE table_name REMOVE PARTITIONING without deleting the corresponding data

refer to:

1) https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html Specific partitioning instructions

2) http://mysql.rjweb.org/doc.php/deletebig#solutions Solutions for deleting large data

The copyright of this article is shared by the author and Blog Garden. Reprinting is welcome, but this statement must be retained without the author's consent, and the original link must be given in a prominent position on the article page. Otherwise, we reserve the right to pursue legal liability.

************************************************************************

Energy is limited, and there are too many ideas. Just focus on doing one thing well.

I'm just a programmer. Write good code within 5 years, polish every word of technical blogs, and insist on zero copy and originality. The significance of blogging is to polish your writing style, train your logic and orderliness, and deepen your systematic understanding of knowledge. If it happens to be helpful to others, it is really a happy thing.

This concludes this article on several practical solutions for quickly deleting large amounts of data (tens of millions) from MySQL. For more information on quickly deleting large amounts of data from MySQL, please search for previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Optimizing query speed of MySQL with tens of millions of data using indexes
  • MySQL loop inserts tens of millions of data
  • How to quickly paginate MySQL data volumes of tens of millions
  • Summary of SQL query optimization knowledge points for MySQL tens of millions of big data
  • MySql quick insert tens of millions of large data examples
  • How to quickly create tens of millions of test data in MySQL
  • Detailed explanation of 30 SQL query optimization techniques for MySQL tens of millions of large data
  • Optimizing the performance of paging query for MySQL with tens of millions of data
  • How to optimize MySQL fast paging for tens of millions of pages
  • Detailed explanation of MySQL database tens of millions of data query and storage

<<:  Webservice remote debugging and timeout operation principle analysis

>>:  React and Redux array processing explanation

Recommend

Why TypeScript's Enum is problematic

Table of contents What happened? When to use Cont...

A simple way to implement all functions of shopping cart in Vue

The main functions are as follows: Add product in...

MySQL data duplicate checking and deduplication implementation statements

There is a table user, and the fields are id, nic...

A brief discussion on MySQL select optimization solution

Table of contents Examples from real life Slow qu...

MySQL 5.6.27 Installation Tutorial under Linux

This article shares the installation tutorial of ...

A brief discussion on two methods to solve space-evenly compatibility issues

Since its launch in 2009, flex has been supported...

mysql8 Common Table Expression CTE usage example analysis

This article uses an example to describe how to u...

Detailed analysis of classic JavaScript recursion case questions

Table of contents What is recursion and how does ...

Detailed explanation of CSS3 animation and new features of HTML5

1. CSS3 animation ☺CSS3 animations are much easie...

Web design tips on form input boxes

1. Dashed box when cancel button is pressed <br...

About WSL configuration and modification issues in Docker

https://docs.microsoft.com/en-us/windows/wsl/wsl-...

Summary of block-level elements, inline elements, and variable elements

Block element p - paragraph pre - format text tabl...

Several ways to connect tables in MySQL

The connection method in MySQL table is actually ...

Two methods to stretch the background image of a web page

There are two solutions: One is CSS, using backgro...