Preface: The reason for working overtime is to go online and solve the problem of duplicate data in the online database. There are duplicate data in 6 tables in the online library, 2 of which are relatively large, one with more than 960,000 and the other with more than 300,000. Because I have dealt with the same problem before, I directly used the Emmmm, but the efficiency is too low, one piece per second, about 20,000+ duplicate data, and the estimated time is about 8 hours. . . There is a problem in blindly relying on the things of predecessors without thinking for yourself! Always wondering why it worked before and why it doesn’t work now is also a problem! I found that I have been in a bad state recently and have lost the desire to explore and seek knowledge. Today is a wake-up call for me and I feel like I have finally found my way back to the right path. Now, let's get back to the topic. The deduplication steps are introduced in detail below. 1. Discover the problemCREATE TABLE `animal` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(20) DEFAULT NULL, `age` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin; INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('1', 'cat', '12'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('2', 'dog', '13'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('3', 'camel', '25'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('4', 'cat', '32'); INSERT INTO `pilipa_dds`.`student` (`id`, `name`, `age`) VALUES ('5', 'dog', '42'); Goal: We want to remove data with the same First, see which data is repeated. SELECT name,count( 1 ) FROM student GROUP BY NAME HAVING count( 1 ) > 1; Output:
The data Select * From Table Where Repeating Field In (Select Repeating Field From Table Group By Repeating Field Having Count(1)>1) 2. Delete all duplicate data without leaving anyDirect deletion will result in an error. DELETE FROM student WHERE NAME IN ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1) Error: 1093 - You can't specify target table 'student' for update in FROM clause, Time: 0.016000s The reason is: while updating this table, this table is queried, while querying this table, this table is updated again, which can be understood as a deadlock. MySQL does not support this operation of updating and querying the same table. Solution: Query the columns of data to be updated as a third-party table, and then filter and update them. DELETE FROM student WHERE NAME IN ( SELECT t.NAME FROM ( SELECT NAME FROM student GROUP BY NAME HAVING count( 1 ) > 1 ) t) 3. Delete duplicate data from the delete table and keep only oneBefore deleting, we can check what kind of duplicate data we want to delete. SELECT * FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) What does this mean? First, we group by 4. Start deleting duplicate data and keep only one It's very simple. Just replace DELETE FROM student WHERE id NOT IN ( SELECT t.id FROM ( SELECT MIN( id ) AS id FROM student GROUP BY `name` ) t ) The execution of a table with 900,000+ records is super fast. This is the end of this article about how to filter out duplicate data during You may also be interested in:
|
>>: base target="" controls the link's target open frame
This article shares the specific code of the pull...
Problem Description MySQL reports an error when s...
Table of contents 1. Object 1.1 What is an object...
undefined In JavaScript, if we want to determine ...
This article example shares the specific code of ...
When nginx receives a request, it will first matc...
Let's take an example: The code is very simple...
Table of contents MySQL Client/Server Protocol If...
Methods for changing passwords before MySQL 5.7: ...
Just as the title! The commonly used font-family l...
Table of contents 1. Introduction to Nginx 2. Ima...
Preface Take Element Plus as an example to config...
1. Preparation before installation 1. Download th...
Follow the steps below 1. request.js content: htt...
First, let me give you an example (if you don’t w...