Preface: The reason for working overtime is to go online and solve the problem of duplicate data in the online database. There are duplicate data in 6 tables in the online library, 2 of which are relatively large, one with more than 960,000 and the other with more than 300,000. Because I have dealt with the same problem before, I directly used the Emmmm, but the efficiency is too low, one piece per second, about 20,000+ duplicate data, and the estimated time is about 8 hours. . . There is a problem in blindly relying on the things of predecessors without thinking for yourself! Always wondering why it worked before and why it doesn’t work now is also a problem! I found that I have been in a bad state recently and have lost the desire to explore and seek knowledge. Today is a wake-up call for me and I feel like I have finally found my way back to the right path. Now, let's get back to the topic. The deduplication steps are introduced in detail below. 1. Discover the problemCREATE TABLE `animal` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL, `age` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('1', 'cat', '12'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('2', 'dog', '13'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('3', 'camel', '25'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('4', 'cat', '32'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('5', 'dog', '42'); Goal: We want to remove data with the same First, see which data is repeated. SELECT name,count( 1 ) FROM student GROUP BY NAME HAVING count( 1 ) > 1; Output:
The data Select * From Table Where Repeating Field In (Select Repeating Field From Table Group By Repeating Field Having Count(1)>1) 2. Delete all duplicate data without leaving anyDirect deletion will result in an error. DELETE FROM student WHERE NAME IN ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1) Error: 1093 - You can't specify target table 'student' for update in FROM clause, Time: 0.016000s The reason is: while updating this table, this table is queried, while querying this table, this table is updated again, which can be understood as a deadlock. MySQL does not support this operation of updating and querying the same table. Solution: Query the columns of data to be updated as a third-party table, and then filter and update them. DELETE FROM student WHERE NAME IN ( SELECT t.NAME FROM ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1 ) t) 3. Delete duplicate data from the delete table and keep only oneBefore deleting, we can check what kind of duplicate data we want to delete. SELECT * FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) What does this mean? First, we group by 4. Start deleting duplicate data and keep only one It's very simple. Just replace DELETE FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) The execution of a table with 900,000+ records is super fast. This is the end of this article about how to filter out duplicate data during You may also be interested in:
|
>>: base target="" controls the link's target open frame
Table of contents 1. Nodes, trees, and virtual DO...
TOP Observation: The percentage of CPU time occup...
Recently, the company has put forward a requireme...
The latest Perfect Aloe Vera Gel packaging box ha...
1. First, understand the overflow-wrap attribute ...
1. <dl> defines a list, <dt> defines ...
Part 1: Basics 1. Unlike pseudo-classes such as :...
Install linux7.2 Internet access configuration on...
This article shares the specific code of JavaScri...
1. Install kvm virtualization : : : : : : : : : :...
The establishment of MySQL index is very importan...
Preface The simple understanding of MySQL permiss...
This article mainly introduces the implementation...
The solution to forgetting the initial password o...
Preface Mobile devices have higher requirements f...