Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL

Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL

The author recently encountered a performance bottleneck problem in his work. The MySQL table has approximately 7.76 million new records added every day, and the storage period is 7 days. Data older than 7 days needs to be aged before new records are added. After running for 9 consecutive days, it takes about 3.5 hours to delete one day's data (environment: 128G, 32 cores, 4T hard drive), which is unacceptable. Of course, if you want to delete the entire table, there is no doubt to use

Just TRUNCATE TABLE.

The initial solution (because it was not expected that the deletion would be so slow) is as follows (the simplest and most naive method):

delete from table_name where cnt_date <= target_date

After further research, we finally achieved the goal of deleting more than 7.7 million data records at a lightning speed (about 1 second). The total amount of data in a single table was around 46 million. The optimization process was carried out in layers, and the details are recorded as follows:

  • Batch delete (limit a certain number each time), and then delete in a loop until all data is deleted; at the same time, key_buffer_size is increased from the default 8M to 512M

Operation effect: The deletion time was increased from about 3.5 hours to 3 hours

(1) Use limit (the specific size should be set as appropriate) to limit the amount of data deleted at one time, and then determine whether the data has been deleted. The source code is as follows (Python implementation):

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
 except:
  mysqlconn.rollback()

(2) Increase key_buffer_size

mysqlcur.execute("SET GLOBAL key_buffer_size = 536870912")

key_buffer_size is a global variable. For details, see the official MySQL documentation: https://dev.mysql.com/doc/refman/5.7/en/server-configuration.html

  • DELETE QUICK + OPTIMIZETABLE

Applicable scenario: MyISAM Tables

Why: MyISAM maintains deleted data in a linked list, and the space and row positions will be reused by Insert data later. After a direct delete, MySQL will merge the index blocks, which involves copying and moving a large amount of memory; while OPTIMIZE TABLE directly rebuilds the index, that is, directly creates a new copy of the data block (think of the JVM garbage collection algorithm).

Operation effect: The deletion time was increased from 3.5 hours to 1 hour and 40 minutes

The specific code is as follows:

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE QUICK from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 optimize_sql = "OPTIMIZE TABLE g_visit_relation_asset"
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
  mysqlcur.execute(optimize_sql)
  mysqlconn.commit()
 except:
  mysqlconn.rollback()
  • Table partition, directly delete the partition with the expiration date (final solution - flash sale)

There are several ways to partition a MySQL table, including RANGE, KEY, LIST, and HASH. For details, see the official documentation. Because the date of the application scenario here is changing, it is not suitable to use RANGE to set a fixed partition name. HASH partition is more suitable for this scenario.

(1) Partition table definition, the SQL statement is as follows:

ALTER TABLE table_name PARTITION BY HASH(TO_DAYS(cnt_date)) PARTITIONS 7;

TO_DAYS converts the date (must be a date type, otherwise an error will be reported: Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed) into days (the total number of days in the year, month, and day), and then HASH; creates 7 partitions. Actually, it’s days MOD 7.

(2) Query the partition where the date to be aged is located. The SQL statement is as follows:

"explain partitions select * from g_visit_relation_asset where cnt_date = '%s'" % expired_day

The execution results are as follows (the partitions column indicates the partition):

+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | table_name | p1 | ALL | cnt_date_index | NULL | NULL | NULL | 1325238 | 100.00 | Using where |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

(3) OPTIMIZE or REBUILD partition, the SQL statement is as follows:

"ALTER TABLE g_visit_relation_asset OPTIMIZE PARTITION '%s'" % partition

The complete code is as follows [Python implementation], which loops and deletes data that is less than the specified date:

def clear_partition_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 expired_day = day
 query_partition_sql = "explain partitions select * from table_name where cnt_date = '%s'" % expired_day
 # OPTIMIZE or REBUILD after truncate partition
 try: 
  while True:
   df = pd.read_sql(query_partition_sql, mysqlconn)
   if df is None or df.empty:
    break
   partition = df.loc[0, 'partitions']
   if partition is not None:
    clear_partition_sql = "alter table table_name TRUNCATE PARTITION %s" % partition
    mysqlcur.execute(clear_partition_sql)
    mysqlconn.commit()

    optimize_partition_sql = "ALTER TABLE table_name OPTIMIZE PARTITION %s" % partition
    mysqlcur.execute(optimize_partition_sql)
    mysqlconn.commit()
   
   expired_day = (expired_day - timedelta(days = 1)).strftime("%Y-%m-%d")
   df = pd.read_sql(query_partition_sql, mysqlconn)
 except:
  mysqlconn.rollback()
  • other

If the deleted data exceeds 50% of the table data, it is recommended to copy the required data to a temporary table, then delete the original table, and then rename the temporary table to the original table. MySQL is as follows:

 INSERT INTO New
  SELECT * FROM Main
   WHERE ...; -- just the rows you want to keep
 RENAME TABLE main TO Old, New TO Main;
 DROP TABLE Old; -- Space freed up here

You can delete partitions by: ALTER TABLE table_name REMOVE PARTITIONING without deleting the corresponding data

refer to:

1) https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html Specific partitioning instructions

2) http://mysql.rjweb.org/doc.php/deletebig#solutions Solutions for deleting large data

The copyright of this article is shared by the author and Blog Garden. Reprinting is welcome, but this statement must be retained without the author's consent, and the original link must be given in a prominent position on the article page. Otherwise, we reserve the right to pursue legal liability.

************************************************************************

Energy is limited, and there are too many ideas. Just focus on doing one thing well.

I'm just a programmer. Write good code within 5 years, polish every word of technical blogs, and insist on zero copy and originality. The significance of blogging is to polish your writing style, train your logic and orderliness, and deepen your systematic understanding of knowledge. If it happens to be helpful to others, it is really a happy thing.

This concludes this article on several practical solutions for quickly deleting large amounts of data (tens of millions) from MySQL. For more information on quickly deleting large amounts of data from MySQL, please search for previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Optimizing query speed of MySQL with tens of millions of data using indexes
  • MySQL loop inserts tens of millions of data
  • How to quickly paginate MySQL data volumes of tens of millions
  • Summary of SQL query optimization knowledge points for MySQL tens of millions of big data
  • MySql quick insert tens of millions of large data examples
  • How to quickly create tens of millions of test data in MySQL
  • Detailed explanation of 30 SQL query optimization techniques for MySQL tens of millions of large data
  • Optimizing the performance of paging query for MySQL with tens of millions of data
  • How to optimize MySQL fast paging for tens of millions of pages
  • Detailed explanation of MySQL database tens of millions of data query and storage

<<:  Webservice remote debugging and timeout operation principle analysis

>>:  React and Redux array processing explanation

Recommend

Notes on using the blockquote tag

<br />Semanticization cannot be explained in...

The perfect solution for highlighting keywords in HTML

I recently encountered a feature while working on...

Vue implements horizontal scrolling of marquee style text

This article shares the specific code for Vue to ...

Functions in TypeScript

Table of contents 1. Function definition 1.1 Func...

How to install jupyter in docker on centos and open ports

Table of contents Install jupyter Docker port map...

Detailed steps to implement the Excel import function in Vue

1. Front-end-led implementation steps The first s...

vue+element custom query component

This article mainly introduces the Vue project. O...

Tutorial on installing MySQL under Linux

Table of contents 1. Delete the old version 2. Ch...

How to check the version of Kali Linux system

1. Check the kali linux system version Command: c...

MySQL 8.0 New Features - Introduction to Check Constraints

Table of contents Preface Check Constraints Creat...

MySQL initialization password operation under Mac

A simple record of the database startup problems ...

How to update, package, and upload Docker containers to Alibaba Cloud

This time, we will try to package the running con...

Mysql delete data and data table method example

It is very easy to delete data and tables in MySQ...

Simple example of using Docker container

Table of contents 1. Pull the image 2. Run the im...

Explanation of the new feature of Hadoop 2.X, the recycle bin function

By turning on the Recycle Bin function, you can r...