How to find and delete duplicate records in MySQL

How to find and delete duplicate records in MySQL

Hello everyone, I am Tony, a teacher who only talks about techniques and doesn’t cut hair. Due to some historical reasons or misoperation, there may be duplicate records in the data table; today we will talk about how to find duplicate data in MySQL tables and how to delete these duplicate records.

Create a sample table

First, create a sample table people and generate some data:

drop table if exists people;
create table people (
 id int auto_increment primary key,
 name varchar(50) not null,
 email varchar(100) not null
);

insert into people(name, email)
values ​​('张三', '[email protected]'),
  ('Li Si', '[email protected]'),
  ('Wang Wu', '[email protected]'),
  ('李斯', '[email protected]'),
  ('Wang Wu', '[email protected]'),
  ('Wang Wu', '[email protected]');

select * from people;
id|name |email |
--|------|-----------------|
 1|Zhang San|[email protected]|
 2|Li Si|[email protected] |
 3|Wang Wu|[email protected] |
 4|Lisi|[email protected] |
 5|Wang Wu|[email protected] |
 6|Wang Wu|[email protected] |

Among them, the email fields of 2 and 4 have duplicate data; the name and email fields of 3, 5, and 6 have duplicate data.

At this point, if we try to create a unique constraint for email, an error will be returned:

alter table people add constraint uk_people_email unique key (email);
ERROR 1062 (23000): Duplicate entry '[email protected]' for key 'people.uk_people_email'

Obviously, we must find and remove duplicate records in the email field to create a unique constraint.

Find duplicate data in a single field

If you want to find duplicate email data, you can group and count based on this field and return the groups with more than 1 row:

select email, count(email)
from people
group by email
having count(email) > 1;
email |count(email)|
---------------|------------|
[email protected] | 2|
[email protected]| 3|

The query results show that there are two duplicate email addresses. If you want to view the complete duplicate data, you can use a subquery or a join query:

select *
from people
where email in (
  select email
  from people
  group by email
  having count(email) > 1)
order by email;
id|name |email |
--|------|--------------|
 2|Li Si|[email protected] |
 4|Lisi|[email protected] |
 3|Wang Wu|[email protected]|
 5|Wang Wu|[email protected]|
 6|Wang Wu|[email protected]|

select p.*
from people
join (
 select email
 from people
 group by email
 having count(email) > 1
) d on p.email = d.email
order by email;
id|name |email |
--|------|--------------|
 2|Li Si|[email protected] |
 4|Lisi|[email protected] |
 3|Wang Wu|[email protected]|
 5|Wang Wu|[email protected]|
 6|Wang Wu|[email protected]|

Another way to find duplicate records is to use a self-join query and the distinct operator directly, for example:

select distinct p.*
from people
join people d on p.email = d.email
where p.id <> d.id
order by p.email;
id|name |email |
--|------|--------------|
 4|Lisi|[email protected] |
 2|Li Si|[email protected] |
 6|Wang Wu|[email protected]|
 5|Wang Wu|[email protected]|
 3|Wang Wu|[email protected]|

Note that distinct cannot be omitted, otherwise some data (3, 5, 6) will be returned multiple times.

Find duplicate data in multiple fields

If we want to find data with duplicate name and email fields, the implementation is similar:

select *
from people
where (name, email) in (
  select name, email
  from people
  group by name, email
  having count(1) > 1)
order by email;
id|name |email |
--|------|--------------|
 3|Wang Wu|[email protected]|
 5|Wang Wu|[email protected]|
 6|Wang Wu|[email protected]|

select distinct p.*
from people
join people d on p.name = d.name and p.email = d.email
where p.id <> d.id
order by email;
id|name |email |
--|------|--------------|
 6|Wang Wu|[email protected]|
 5|Wang Wu|[email protected]|
 3|Wang Wu|[email protected]|

Duplicate data is only when both name and email are the same, so 2 and 4 are not duplicate records.

Deduplication

After finding the duplicate data, we need to solve the problem of how to delete it. Usually we need to keep one of the records.

Deleting Duplicate Data Using DELETE FROM

If we want to delete duplicate email records and keep only one, we can use the DELETE FROM statement to achieve this:

delete p
from people
join people d on p.email = d.email and p.id < d.id;

The delete statement uses a connection to find the records to be deleted. The above example retains the data row corresponding to the largest ID in the duplicate data. Query the people table again:

select * from people;
id|name |email |
--|------|-----------------|
 1|Zhang San|[email protected]|
 4|Lisi|[email protected] |
 6|Wang Wu|[email protected] |

Think about it, if you want to keep the data with the smallest id among the duplicate data, how should you achieve it?

Deleting duplicate data using subqueries

You can use subqueries to find the data you need to keep and then delete the rest:

delete
from people
where id not in (
  select max(id)
  from people
  group by email
  );

Before executing the above statement, remember to recreate the people table and generate test data.

Deleting duplicate data through intermediate tables

Deletion of duplicate records can also be achieved by using an intermediate table, for example:

-- Create an intermediate table create table people_temp like people;

-- Copy the data rows that need to be retained and insert into people_temp (id, name, email)
select id, name, email
from people
where id in (
  select max(id)
  from people
  group by email
  );

--Delete the original table drop table people;

-- Rename the intermediate table to the original table alter table people_temp rename to people;

Before executing the above statement, remember to recreate the people table and generate test data.

One thing you need to pay attention to with this method is that the create table ... like statement will not copy the foreign key constraints on the original table and needs to be added manually.

Using window functions to remove duplicate data

ROW_NUMBER() is a new window function in MySQL 8.0 that can be used to group data and then assign a unique number to each piece of data. For example:

select id, name, email, 
  row_number() over (partition by email order by id) as row_num 
from people;
id|name |email |row_num|
--|------|-----------------|-------|
 2|Li Si|[email protected] | 1|
 4|Lisi|[email protected] | 2|
 3|Wang Wu|[email protected] | 1|
 5|Wang Wu|[email protected] | 2|
 6|Wang Wu|[email protected] | 3|
 1|Zhangsan|[email protected]| 1|

The above statement groups the data by email (partition by email), sorts them by id (order by id), and then assigns a number to the data in each group; if the number is greater than 1, it means there is duplicate data.

📝In addition to ROW_NUMBER(), RANK() or DENSE_RANK() functions can also achieve the above functions. For an introduction and use cases of window functions, please refer to this article.

Based on the query results, duplicate records can be deleted:

delete
from people
where id in (
 select id
 from (
  select id,
    row_number() over (partition by email order by id desc) as row_num 
  from people)
 where row_num > 1);

Before executing the above statement, remember to recreate the people table and generate test data.

The method for deduplicating data based on multiple fields is very similar to that for a single field. You can try it yourself, and you are welcome to leave a message for discussion!

Summarize

This article introduces how to find and delete duplicate records in MySQL, including using GROUP BY, subqueries, or join queries to find duplicate data in a single field or multiple fields, and using DELETE FROM statements, subqueries, intermediate tables, and window functions to delete duplicate data. For more information about MySQL Find and Delete Duplicate Records, please search 123WORDPRESS.COM's previous articles or continue browsing the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • How to check if a table exists in MySQL and then delete it in batches
  • Why the table file size remains unchanged after deleting data in MySQL
  • Three ways to delete a table in MySQL (summary)
  • Detailed explanation of several practical solutions for quickly deleting large amounts of data (tens of millions) in MySQL
  • Specific method to delete mysql service
  • MySQL's method of dealing with duplicate data (preventing and deleting)
  • MySQL database operations (create, select, delete)
  • How to recover deleted MySQL 8.0.17 root account and password under Windows
  • How to completely delete the MySQL 8.0 service under Linux
  • MySQL table deletion operation implementation (differences between delete, truncate, and drop)
  • Troubleshooting the reasons why MySQL deleted records do not take effect

<<:  FastDFS and Nginx integration to achieve code analysis

>>:  js realizes the dynamic loading of data by waterfall flow bottoming out

Recommend

Summary of Form Design Techniques in Web Design

“Inputs should be divided into logical groups so ...

Example of using CSS3 to customize the style of input multiple-select box

Principle: First hide the input element, then use...

JavaScript imitates Taobao magnifying glass effect

This article shares the specific code for JavaScr...

Sample code using scss in uni-app

Pitfalls encountered I spent the whole afternoon ...

How to use @media in mobile adaptive styles

General mobile phone style: @media all and (orien...

MySQL 8.0.15 winx64 installation and configuration method graphic tutorial

This article shares the installation and configur...

Summary of frequently used commands for Linux file operations

0. New operation: mkdir abc #Create a new folder ...

Detailed explanation of the difference between Vue life cycle

Life cycle classification Each component of vue i...

Nginx installation error solution

1. Unzip nginx-1.8.1.tar.gz 2. Unzip fastdfs-ngin...

MySQL 8.0.11 MSI version installation and configuration graphic tutorial

This article shares the installation and configur...

3 codes for automatic refresh of web pages

In fact, it is very simple to achieve this effect,...

How to connect to MySQL visualization tool Navicat

After installing Navicat The following error may ...

10 very good CSS skills collection and sharing

Here, clever use of CSS techniques allows you to g...

Some conclusions on developing mobile websites

The mobile version of the website should at least...

Combining XML and CSS styles

student.xml <?xml version="1.0" enco...