Preface: The reason for working overtime is to go online and solve the problem of duplicate data in the online database. There are duplicate data in 6 tables in the online library, 2 of which are relatively large, one with more than 960,000 and the other with more than 300,000. Because I have dealt with the same problem before, I directly used the Emmmm, but the efficiency is too low, one piece per second, about 20,000+ duplicate data, and the estimated time is about 8 hours. . . There is a problem in blindly relying on the things of predecessors without thinking for yourself! Always wondering why it worked before and why it doesn’t work now is also a problem! I found that I have been in a bad state recently and have lost the desire to explore and seek knowledge. Today is a wake-up call for me and I feel like I have finally found my way back to the right path. Now, let's get back to the topic. The deduplication steps are introduced in detail below. 1. Discover the problemCREATE TABLE `animal` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL, `age` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('1', 'cat', '12'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('2', 'dog', '13'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('3', 'camel', '25'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('4', 'cat', '32'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('5', 'dog', '42'); Goal: We want to remove data with the same First, see which data is repeated. SELECT name,count( 1 ) FROM student GROUP BY NAME HAVING count( 1 ) > 1; Output:
The data Select * From Table Where Repeating Field In (Select Repeating Field From Table Group By Repeating Field Having Count(1)>1) 2. Delete all duplicate data without leaving anyDirect deletion will result in an error. DELETE FROM student WHERE NAME IN ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1) Error: 1093 - You can't specify target table 'student' for update in FROM clause, Time: 0.016000s The reason is: while updating this table, this table is queried, while querying this table, this table is updated again, which can be understood as a deadlock. MySQL does not support this operation of updating and querying the same table. Solution: Query the columns of data to be updated as a third-party table, and then filter and update them. DELETE FROM student WHERE NAME IN ( SELECT t.NAME FROM ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1 ) t) 3. Delete duplicate data from the delete table and keep only oneBefore deleting, we can check what kind of duplicate data we want to delete. SELECT * FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) What does this mean? First, we group by 4. Start deleting duplicate data and keep only one It's very simple. Just replace DELETE FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) The execution of a table with 900,000+ records is super fast. This is the end of this article about how to filter out duplicate data during You may also be interested in:
|
>>: base target="" controls the link's target open frame
Being a web designer is not easy. Not only do you...
Table of contents Preface 1. Recursive components...
Through the brief introduction in the previous tw...
Preface Many years ago, I was a newbie on the ser...
Table of contents definition The role of the curs...
In the previous article, we played with timeouts ...
Table of contents 1. Pull the image 2. Create a R...
Table of contents 1. View hook 1. Things to note ...
Preface: I'm currently learning Linux and .Ne...
The specific code for encapsulating the image cap...
Without further ado, I will post the code for you...
Table of contents 1. Implement the $(".box1&...
Table of contents What is the Picker component Pr...
There is such a scenario: a circular container, t...
The difference between := and = = Only when setti...