MySQL deduplication methods

MySQL deduplication methods

MySQL deduplication methods

【Beginner】There are very few repeated lines

Use distinct to find them out, and then delete them manually one by one.

[Intermediate] Remove duplicates based on a single field

For example: remove duplicates from the id field

How to use: Get the value of the repeated field of id, compare the fields with different data in the rows with the same id field, and delete all repeated rows except the row with the smallest (or largest) field. Generally, the primary key is used for comparison because the value of the primary key must be unique and absolutely different.

id name

1 a

1 b

2 c

2 a

3 c

result:

id name

1 a 

2 a

operate:

delete from a_tmp 

where id in (select * from (select b.id from a_tmp b group by b.id having count(b.id) >1) bb) 

and name not in (select * from (select min(a.name) from a_tmp a GROUP BY a.id having count(a.id) >1) aa);

Notice:

The bold and green words above must be aliased and must use the select * from (……) format, otherwise an error will be reported:

[Err] 1093 - You can't specify target table 'a_tmp' for update in FROM clause

[Advanced] Remove duplicates by repetition of multiple fields

For example, to remove duplicates of the same id and name, that is, to count the same id and name as duplicate rows, and to count the same id but different names as non-duplicate rows

Usage: Similar to a single field, the primary key is generally used for comparison because the value of the primary key must be unique.

id name rowid

1a1

1 a 2

1 b 3

2 b 4

2 b 5

3 c 6

3d7

result:

id name rowid

1a1 

1 b 3

2 b 4

3 c 6

3d7

operate:

The first one:

delete from a_tmp 

where (id,name) in (select * from (select b.id,b.name from a_tmp b group by b.id,b.name having count(b.id) >1) bb) 

and rowid not in (select * from (select min(a.rowid) from a_tmp a group by a.id,a.name having count(a.id) >1) aa);

Second type:

Connect the values ​​of the id and name fields and insert them into the temporary table b_tmp. Then you can use the [Intermediate] single field judgment and deletion method.

#Insert the value of the two fields connected and the field with the unique value in the a_tmp table into the b_tmp table

insert into b_tmp 

 select concat(id,name),rowid from a_tmp;

#Find out the rows that need to be kept select id_name,max(rowid)

 from b_tmp 

 group by id_name

 having count(id_name)>1;

#Use the [Intermediate] method or stored procedure to complete the deduplication work

[Ultimate] Each row has two copies of the same data

For example:

How to use: If the data in the entire row is the same, it is impossible to use SQL statements to delete it, because there is no conditional restriction that can be used to keep one row and delete all the identical rows. There are no different fields that can be created by yourself, that is: add a field, set it to auto-increment, and set it as the primary key, it will automatically add values.

id name

1 a

1 a

1 b

1 b

2 c

2 c

3 c

3 c

result:

id name rowid

1a1

1 b 3

2 c 5

3 c 7

operate:

Add an auto-increment field and temporarily set it as the primary key.

Use the [Intermediate] and [Advanced] methods above.

Thank you for reading, I hope it can help you, thank you for your support of this site!

You may also be interested in:
  • MySQL development skills: JOIN update and data duplication check/deduplication
  • MySQL data duplicate checking and deduplication implementation statements
  • One sql statement completes MySQL deduplication and keeps one
  • A brief discussion on deduplication in SQL database
  • A small example of SQL grouping and sorting to remove duplicates
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • A practical record of how to check and remove duplicate SQL

<<:  CentOS7 deployment Flask (Apache, mod_wsgi, Python36, venv)

>>:  Detailed explanation of Docker+Jenkins+Gitlab+Django application deployment practice

Recommend

SQL implementation of LeetCode (177. Nth highest salary)

[LeetCode] 177.Nth Highest Salary Write a SQL que...

Analysis of the use of the MySQL database show processlist command

In actual project development, if we have a lot o...

WeChat applet implements the Record function

This article shares the specific code for the WeC...

Pure CSS meteor shower background sample code

GitHub address, you can star it if you like it Pl...

The actual process of encapsulating axios in the project

Table of contents Preface Benefits of axios encap...

How to optimize images to improve website performance

Table of contents Overview What is Image Compress...

mysql5.5 installation graphic tutorial under win7

MySQL installation is relatively simple, usually ...

A small question about the execution order of SQL in MySQL

I encountered a sql problem at work today, about ...

Several ways of running in the background of Linux (summary)

1. nohup Run the program in a way that ignores th...

The difference between docker run and start

The difference between run and start in docker Do...

Learn about CSS label display mode in one article

Tag type (display mode) HTML tags are generally d...

MySQL character set garbled characters and solutions

Preface A character set is a set of symbols and e...