Hello everyone, I am Tony, a teacher who only talks about techniques and doesn’t cut hair. Due to some historical reasons or misoperation, there may be duplicate records in the data table; today we will talk about how to find duplicate data in MySQL tables and how to delete these duplicate records. Create a sample table First, create a sample table people and generate some data: drop table if exists people; create table people ( id int auto_increment primary key, name varchar(50) not null, email varchar(100) not null ); insert into people(name, email) values ('张三', '[email protected]'), ('Li Si', '[email protected]'), ('Wang Wu', '[email protected]'), ('李斯', '[email protected]'), ('Wang Wu', '[email protected]'), ('Wang Wu', '[email protected]'); select * from people; id|name |email | --|------|-----------------| 1|Zhang San|[email protected]| 2|Li Si|[email protected] | 3|Wang Wu|[email protected] | 4|Lisi|[email protected] | 5|Wang Wu|[email protected] | 6|Wang Wu|[email protected] | Among them, the email fields of 2 and 4 have duplicate data; the name and email fields of 3, 5, and 6 have duplicate data. At this point, if we try to create a unique constraint for email, an error will be returned: alter table people add constraint uk_people_email unique key (email); ERROR 1062 (23000): Duplicate entry '[email protected]' for key 'people.uk_people_email' Obviously, we must find and remove duplicate records in the email field to create a unique constraint. Find duplicate data in a single field If you want to find duplicate email data, you can group and count based on this field and return the groups with more than 1 row: select email, count(email) from people group by email having count(email) > 1; email |count(email)| ---------------|------------| [email protected] | 2| [email protected]| 3| The query results show that there are two duplicate email addresses. If you want to view the complete duplicate data, you can use a subquery or a join query: select * from people where email in ( select email from people group by email having count(email) > 1) order by email; id|name |email | --|------|--------------| 2|Li Si|[email protected] | 4|Lisi|[email protected] | 3|Wang Wu|[email protected]| 5|Wang Wu|[email protected]| 6|Wang Wu|[email protected]| select p.* from people join ( select email from people group by email having count(email) > 1 ) d on p.email = d.email order by email; id|name |email | --|------|--------------| 2|Li Si|[email protected] | 4|Lisi|[email protected] | 3|Wang Wu|[email protected]| 5|Wang Wu|[email protected]| 6|Wang Wu|[email protected]| Another way to find duplicate records is to use a self-join query and the distinct operator directly, for example: select distinct p.* from people join people d on p.email = d.email where p.id <> d.id order by p.email; id|name |email | --|------|--------------| 4|Lisi|[email protected] | 2|Li Si|[email protected] | 6|Wang Wu|[email protected]| 5|Wang Wu|[email protected]| 3|Wang Wu|[email protected]| Note that distinct cannot be omitted, otherwise some data (3, 5, 6) will be returned multiple times. Find duplicate data in multiple fields If we want to find data with duplicate name and email fields, the implementation is similar: select * from people where (name, email) in ( select name, email from people group by name, email having count(1) > 1) order by email; id|name |email | --|------|--------------| 3|Wang Wu|[email protected]| 5|Wang Wu|[email protected]| 6|Wang Wu|[email protected]| select distinct p.* from people join people d on p.name = d.name and p.email = d.email where p.id <> d.id order by email; id|name |email | --|------|--------------| 6|Wang Wu|[email protected]| 5|Wang Wu|[email protected]| 3|Wang Wu|[email protected]| Duplicate data is only when both name and email are the same, so 2 and 4 are not duplicate records. Deduplication After finding the duplicate data, we need to solve the problem of how to delete it. Usually we need to keep one of the records. Deleting Duplicate Data Using DELETE FROM If we want to delete duplicate email records and keep only one, we can use the DELETE FROM statement to achieve this: delete p from people join people d on p.email = d.email and p.id < d.id; The delete statement uses a connection to find the records to be deleted. The above example retains the data row corresponding to the largest ID in the duplicate data. Query the people table again: select * from people; id|name |email | --|------|-----------------| 1|Zhang San|[email protected]| 4|Lisi|[email protected] | 6|Wang Wu|[email protected] | Think about it, if you want to keep the data with the smallest id among the duplicate data, how should you achieve it? Deleting duplicate data using subqueries You can use subqueries to find the data you need to keep and then delete the rest: delete from people where id not in ( select max(id) from people group by email ); Before executing the above statement, remember to recreate the people table and generate test data. Deleting duplicate data through intermediate tables Deletion of duplicate records can also be achieved by using an intermediate table, for example: -- Create an intermediate table create table people_temp like people; -- Copy the data rows that need to be retained and insert into people_temp (id, name, email) select id, name, email from people where id in ( select max(id) from people group by email ); --Delete the original table drop table people; -- Rename the intermediate table to the original table alter table people_temp rename to people; Before executing the above statement, remember to recreate the people table and generate test data. One thing you need to pay attention to with this method is that the create table ... like statement will not copy the foreign key constraints on the original table and needs to be added manually. Using window functions to remove duplicate data ROW_NUMBER() is a new window function in MySQL 8.0 that can be used to group data and then assign a unique number to each piece of data. For example: select id, name, email, row_number() over (partition by email order by id) as row_num from people; id|name |email |row_num| --|------|-----------------|-------| 2|Li Si|[email protected] | 1| 4|Lisi|[email protected] | 2| 3|Wang Wu|[email protected] | 1| 5|Wang Wu|[email protected] | 2| 6|Wang Wu|[email protected] | 3| 1|Zhangsan|[email protected]| 1| The above statement groups the data by email (partition by email), sorts them by id (order by id), and then assigns a number to the data in each group; if the number is greater than 1, it means there is duplicate data. 📝In addition to ROW_NUMBER(), RANK() or DENSE_RANK() functions can also achieve the above functions. For an introduction and use cases of window functions, please refer to this article. Based on the query results, duplicate records can be deleted: delete from people where id in ( select id from ( select id, row_number() over (partition by email order by id desc) as row_num from people) where row_num > 1); Before executing the above statement, remember to recreate the people table and generate test data. The method for deduplicating data based on multiple fields is very similar to that for a single field. You can try it yourself, and you are welcome to leave a message for discussion! Summarize This article introduces how to find and delete duplicate records in MySQL, including using GROUP BY, subqueries, or join queries to find duplicate data in a single field or multiple fields, and using DELETE FROM statements, subqueries, intermediate tables, and window functions to delete duplicate data. For more information about MySQL Find and Delete Duplicate Records, please search 123WORDPRESS.COM's previous articles or continue browsing the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: FastDFS and Nginx integration to achieve code analysis
>>: js realizes the dynamic loading of data by waterfall flow bottoming out
“Inputs should be divided into logical groups so ...
Principle: First hide the input element, then use...
This article shares the specific code for JavaScr...
Pitfalls encountered I spent the whole afternoon ...
General mobile phone style: @media all and (orien...
This article shares the installation and configur...
0. New operation: mkdir abc #Create a new folder ...
Life cycle classification Each component of vue i...
1. Unzip nginx-1.8.1.tar.gz 2. Unzip fastdfs-ngin...
This article shares the installation and configur...
In fact, it is very simple to achieve this effect,...
After installing Navicat The following error may ...
Here, clever use of CSS techniques allows you to g...
The mobile version of the website should at least...
student.xml <?xml version="1.0" enco...