The author recently encountered a performance bottleneck problem in his work. The MySQL table has approximately 7.76 million new records added every day, and the storage period is 7 days. Data older than 7 days needs to be aged before new records are added. After running for 9 consecutive days, it takes about 3.5 hours to delete one day's data (environment: 128G, 32 cores, 4T hard drive), which is unacceptable. Of course, if you want to delete the entire table, there is no doubt to use Just TRUNCATE TABLE. The initial solution (because it was not expected that the deletion would be so slow) is as follows (the simplest and most naive method): delete from table_name where cnt_date <= target_date After further research, we finally achieved the goal of deleting more than 7.7 million data records at a lightning speed (about 1 second). The total amount of data in a single table was around 46 million. The optimization process was carried out in layers, and the details are recorded as follows:
Operation effect: The deletion time was increased from about 3.5 hours to 3 hours (1) Use limit (the specific size should be set as appropriate) to limit the amount of data deleted at one time, and then determine whether the data has been deleted. The source code is as follows (Python implementation): def delete_expired_data(mysqlconn, day): mysqlcur = mysqlconn.cursor() delete_sql = "DELETE from table_name where cnt_date<='%s' limit 50000" % day query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day try: df = pd.read_sql(query_sql, mysqlconn) while True: if df is None or df.empty: break mysqlcur.execute(delete_sql) mysqlconn.commit() df = pd.read_sql(query_sql, mysqlconn) except: mysqlconn.rollback() (2) Increase key_buffer_size mysqlcur.execute("SET GLOBAL key_buffer_size = 536870912") key_buffer_size is a global variable. For details, see the official MySQL documentation: https://dev.mysql.com/doc/refman/5.7/en/server-configuration.html
Applicable scenario: MyISAM Tables Why: MyISAM maintains deleted data in a linked list, and the space and row positions will be reused by Insert data later. After a direct delete, MySQL will merge the index blocks, which involves copying and moving a large amount of memory; while OPTIMIZE TABLE directly rebuilds the index, that is, directly creates a new copy of the data block (think of the JVM garbage collection algorithm). Operation effect: The deletion time was increased from 3.5 hours to 1 hour and 40 minutes The specific code is as follows: def delete_expired_data(mysqlconn, day): mysqlcur = mysqlconn.cursor() delete_sql = "DELETE QUICK from table_name where cnt_date<='%s' limit 50000" % day query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day optimize_sql = "OPTIMIZE TABLE g_visit_relation_asset" try: df = pd.read_sql(query_sql, mysqlconn) while True: if df is None or df.empty: break mysqlcur.execute(delete_sql) mysqlconn.commit() df = pd.read_sql(query_sql, mysqlconn) mysqlcur.execute(optimize_sql) mysqlconn.commit() except: mysqlconn.rollback()
There are several ways to partition a MySQL table, including RANGE, KEY, LIST, and HASH. For details, see the official documentation. Because the date of the application scenario here is changing, it is not suitable to use RANGE to set a fixed partition name. HASH partition is more suitable for this scenario. (1) Partition table definition, the SQL statement is as follows: ALTER TABLE table_name PARTITION BY HASH(TO_DAYS(cnt_date)) PARTITIONS 7; TO_DAYS converts the date (must be a date type, otherwise an error will be reported: Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed) into days (the total number of days in the year, month, and day), and then HASH; creates 7 partitions. Actually, it’s days MOD 7. (2) Query the partition where the date to be aged is located. The SQL statement is as follows: "explain partitions select * from g_visit_relation_asset where cnt_date = '%s'" % expired_day The execution results are as follows (the partitions column indicates the partition):
(3) OPTIMIZE or REBUILD partition, the SQL statement is as follows: "ALTER TABLE g_visit_relation_asset OPTIMIZE PARTITION '%s'" % partition The complete code is as follows [Python implementation], which loops and deletes data that is less than the specified date: def clear_partition_data(mysqlconn, day): mysqlcur = mysqlconn.cursor() expired_day = day query_partition_sql = "explain partitions select * from table_name where cnt_date = '%s'" % expired_day # OPTIMIZE or REBUILD after truncate partition try: while True: df = pd.read_sql(query_partition_sql, mysqlconn) if df is None or df.empty: break partition = df.loc[0, 'partitions'] if partition is not None: clear_partition_sql = "alter table table_name TRUNCATE PARTITION %s" % partition mysqlcur.execute(clear_partition_sql) mysqlconn.commit() optimize_partition_sql = "ALTER TABLE table_name OPTIMIZE PARTITION %s" % partition mysqlcur.execute(optimize_partition_sql) mysqlconn.commit() expired_day = (expired_day - timedelta(days = 1)).strftime("%Y-%m-%d") df = pd.read_sql(query_partition_sql, mysqlconn) except: mysqlconn.rollback()
If the deleted data exceeds 50% of the table data, it is recommended to copy the required data to a temporary table, then delete the original table, and then rename the temporary table to the original table. MySQL is as follows: INSERT INTO New SELECT * FROM Main WHERE ...; -- just the rows you want to keep RENAME TABLE main TO Old, New TO Main; DROP TABLE Old; -- Space freed up here You can delete partitions by: ALTER TABLE table_name REMOVE PARTITIONING without deleting the corresponding data refer to: 1) https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html Specific partitioning instructions 2) http://mysql.rjweb.org/doc.php/deletebig#solutions Solutions for deleting large data The copyright of this article is shared by the author and Blog Garden. Reprinting is welcome, but this statement must be retained without the author's consent, and the original link must be given in a prominent position on the article page. Otherwise, we reserve the right to pursue legal liability. ************************************************************************ Energy is limited, and there are too many ideas. Just focus on doing one thing well. I'm just a programmer. Write good code within 5 years, polish every word of technical blogs, and insist on zero copy and originality. The significance of blogging is to polish your writing style, train your logic and orderliness, and deepen your systematic understanding of knowledge. If it happens to be helpful to others, it is really a happy thing. This concludes this article on several practical solutions for quickly deleting large amounts of data (tens of millions) from MySQL. For more information on quickly deleting large amounts of data from MySQL, please search for previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Webservice remote debugging and timeout operation principle analysis
>>: React and Redux array processing explanation
Table of contents What happened? When to use Cont...
Table of contents Overview Install Gulp.js Create...
The main functions are as follows: Add product in...
There is a table user, and the fields are id, nic...
Table of contents Examples from real life Slow qu...
This article shares the installation tutorial of ...
Since its launch in 2009, flex has been supported...
This article uses an example to describe how to u...
Table of contents What is recursion and how does ...
1. CSS3 animation ☺CSS3 animations are much easie...
1. Dashed box when cancel button is pressed <br...
https://docs.microsoft.com/en-us/windows/wsl/wsl-...
Block element p - paragraph pre - format text tabl...
The connection method in MySQL table is actually ...
There are two solutions: One is CSS, using backgro...