Example of how to quickly delete a 2T table in mysql in Innodb

Example of how to quickly delete a 2T table in mysql in Innodb

Preface

This article mainly introduces the relevant content about how to quickly delete a 2T large table in MySQL in Innodb. It is shared for your reference and learning. Let's take a look at the detailed introduction.

Come, let's look at the comics to cultivate our sentiments.


OK, that’s it. Suppose you have a table erp, if you directly execute the following command

drop table erp

At this time, all mysql related processes will stop until the drop is completed, and then mysql will resume execution. The reason for this is that when dropping the table, InnoDB maintains a global lock, and the lock is released after the drop is completed.

This means that if during the day, when the traffic is very high, if you execute the command to delete a large table without taking any action, the entire MySQL server will hang. During the table deletion, the QPS will drop sharply, and then the product manager will come to you for tea. That’s why there is a scene in the comic. You can delete it at twelve o’clock in the evening when it is quiet.

Of course, some people may not agree and say, "You can write a stored procedure to delete the table and run it once at night when there is not much traffic."
I was shocked, and after thinking about it, I could only say: "Everyone, please stop arguing and listen to me talk about the common practices in the industry."

A hypothesis

Let me explain first. There is a premise here. MySQL has enabled independent tablespace, which is enabled by default after MySQL 5.6.7.

That is, in my.cnf, there is such a configuration (these are the knowledge of MySQL optimization, which will be introduced to you later)

innodb_file_per_table = 1

Check the status of the table space with the following command

mysql> show variables like '%per_table'; 
+-----------------------+-------+ 
| Variable_name | Value | 
+-----------------------+-------+ 
| innodb_file_per_table | OFF | 
+-----------------------+-------+

If the value of innodb_file_per_table is OFF, it means that a shared tablespace is used.

If the value of innodb_file_per_table is ON, it means that an independent tablespace is used.

So, everyone will ask me, what is the difference between an independent tablespace and a shared tablespace?

Shared tablespace: All table data and index files of a database are placed in one file. The default file path of this shared tablespace is in the data directory. The default file name is: ibdata1 (this file can be expanded into multiple files). Note that in this way, operation and maintenance are extremely inconvenient. You see, all the data is in one file, and it is very inconvenient to maintain a single table. In addition, when you perform a delete operation, many gaps will be left in the file, and the ibdata1 file will not shrink automatically. In other words, if you use a shared tablespace to store data, you may encounter the problem that the space cannot be released after you drop the table.

Independent tablespace: Each table is deployed in an independent manner. Each table has a .frm table description file and a .ibd file.

.frm file: saves the metadata of each table, including the definition of the table structure, etc. This file is independent of the database engine.

.ibd file: A file that stores the data and indexes of each table.

Note that in this way, each table has its own independent table space, which makes operation and maintenance easier and allows a single table to be moved between different databases. In addition, when performing the drop table operation, the table space can be automatically reclaimed. After executing the delete operation, you can use alter table TableName engine=innodb to defragment and reclaim some table space.

ps: datadir in my.cnf is used to set the data storage directory

Okay, I've said a lot above, I just want to say one thing:

In most cases, operations and maintenance will definitely choose an independent tablespace storage method for MySQL, because the independent tablespace method is much stronger from the perspective of performance optimization and operation and maintenance difficulty.

Therefore, the premise I mentioned at the beginning is that MySQL needs to open an independent table space. This assumption is true 90% of the time. If you really encounter a situation where your company's MySQL uses a shared table space, please have a talk with your operation and maintenance staff and ask them why a shared table space is used.

Correct posture

Assume that we have datadir = /data/mysql/, and we have a database named mytest. In the database mytest, there is a table named erp. Execute the following command

mysql> system ls -l /data/mysql/mytest/

I get the following output (I filtered it a bit)

-rw-r----- 1 mysql mysql 9023 8 18 05:21 erp.frm
-rw-r----- 1 mysql mysql 2356792000512 8 18 05:21 erp.ibd

The functions of frm and ibd have been introduced above. Now the erp.ibd file is too large, so the deletion is stuck.

How to solve this problem?

Here we need to use the knowledge of hard links in Linux to perform quick deletion. Let me share some of the content from "Bird Brother's Private Kitchen".

In fact, you can understand soft links as shortcuts in Windows, so I won’t introduce them in detail, but mainly introduce hard links.

As for this hard link, I will just briefly talk about it. I don't want to post a lot of words here, it will seem too tiring.

For the actual stored files, there is a

Then there is a file name pointing to the node Index above


So, the so-called hard link means that there is more than one file name pointing to the node Index, and there are several file names pointing to the node Index.

Assume that there will be another file name pointing to the above node Index, that is

At this time, you delete the file name (1). The Linux system detects that there is a file name (2) pointing to the node index. Therefore, the file is not actually deleted, but the reference in step (2) is deleted. This operation is very fast because it only deletes the reference. So the picture became like this

Next, you delete the file name (2). The Linux system detects that there is no other file name pointing to the node index, and deletes the actual storage file. This operation is to delete the actual file, so it is relatively slow.

OK, we use the above principle.

First create a hard link for erp.ibd, using the ln command

mysql> system ln /data/mysql/mytest/erp.ibd /data/mysql/mytest/erp.ibd.hdlk

At this point, the file directory is as follows

-rw-r----- 1 mysql mysql 9023 8 18 05:21 erp.frm
-rw-r----- 2 mysql mysql 2356792000512 8 18 05:21 erp.ibd
-rw-r----- 2 mysql mysql 2356792000512 8 18 05:21 erp.ibd.hdlk

You will find that there is an additional erp.ibd.hdlk file, and the inode of erp.ibd and erp.ibd.hdlk are both 2.

At this point, you execute the drop table operation

mysql> drop table erp;
Query OK, 0 rows affected (0.99 sec)

You will find that it is deleted in less than 1 second. Because, at this time there are two file names (erp.ibd and erp.ibd.hdlk), pointing to the same inode. At this time, executing the deletion operation only deletes the reference, so it is very fast.

Then, the deletion at this time has deleted the table from mysql. However, the disk space has not been released because there is still a file erp.ibd.hdlk left.

How to delete erp.ibd.hdlk correctly?

If you have no experience, you will definitely answer me and use the rm command to delete it. It should be noted here that in a production environment, directly using the rm command to delete large files will cause a surge in disk IO overhead and excessive CPU load, which will affect the operation of other programs.

Well, at this time, you should use the truncate command to delete it. The truncate command is in the coreutils tool set.

For details, you can go to Baidu. Someone has tested the rm and truncate commands and found that the truncate command has almost no effect on disk IO and CPU load.

The deletion script is as follows

TRUNCATE=/usr/local/bin/truncate
for i in `seq 2194 -10 10 `; 
do 
 sleep 2
 $TRUNCATE -s ${i}G /data/mysql/mytest/erp.ibd.hdlk 
done
rm -rf /data/mysql/mytest/erp.ibd.hdlk;

Start from 2194G, reduce 10G each time, stop for 2 seconds, and continue until only 10G is left in the file. Finally, use the rm command to delete the rest.

Other situations

What this means is what to do if the database is deployed on Windows. My answer to this question is actually not professional enough. Since my debut, I have never encountered a situation where MySQL is installed on Windows in the production environment. Assuming you really encounter this problem, there is a tool called mklink under Windows, which is used to create hard link locks under Windows, and should be able to perform similar functions.

Summarize

The content discussed in this article is more likely to be encountered in the R&D of small and medium-sized companies. Because small and medium-sized companies do not have professional DBAs, R&D guys have to do everything. I hope everyone has gained something from it.

Well, the above is the full content of this article. I hope that the content of this article will have certain reference learning value for your study or work. If you have any questions, you can leave a message to communicate. Thank you for your support of 123WORDPRESS.COM.

You may also be interested in:
  • How to implement batch deletion of large amounts of data in MySQL large tables
  • Detailed explanation of how to gracefully delete a large table in MySQL
  • Solution to MySQL performance problem of deleting large tables
  • A brief discussion on how to elegantly delete large tables in MySQL

<<:  How to run JavaScript in Jupyter Notebook

>>:  How to quickly deploy Gitlab using Docker

Recommend

Vue+swiper realizes timeline effect

This article shares the specific code of vue+swip...

How to solve the problem of blurry small icons on mobile devices

Preface Previously, I talked about the problem of...

Proxy realizes the principle of two-way binding of Vue3 data

Table of contents 1. Advantages of proxy vs. Obje...

The process of installing Docker in Linux system

In this blog, I will walk you through the process...

Solve the Linux Tensorflow2.0 installation problem

conda update conda pip install tf-nightly-gpu-2.0...

Vue page monitoring user preview time function implementation code

A recent business involves such a requirement tha...

JavaScript method to detect the type of file

Table of contents 1. How to view the binary data ...

How to use CSS to write different styles according to sub-elements

The effect we need to achieve: What is needed The...

A few things you need to know about responsive layout

1. Introduction Responsive Web design allows a we...

A brief discussion on tags in HTML

0. What is a tag? XML/HTML CodeCopy content to cl...

ffmpeg Chinese parameter description and usage examples

1. When ffmpeg pushes video files, the encoding f...

Common JavaScript memory errors and solutions

Table of contents 1. Timer monitoring 2. Event mo...

Detailed explanation of Linux file permissions and group modification commands

In Linux, everything is a file (directories are a...

View the command to modify the MySQL table structure

Brief description The editor often encounters som...