MySQL deduplication methods

MySQL deduplication methods

MySQL deduplication methods

【Beginner】There are very few repeated lines

Use distinct to find them out, and then delete them manually one by one.

[Intermediate] Remove duplicates based on a single field

For example: remove duplicates from the id field

How to use: Get the value of the repeated field of id, compare the fields with different data in the rows with the same id field, and delete all repeated rows except the row with the smallest (or largest) field. Generally, the primary key is used for comparison because the value of the primary key must be unique and absolutely different.

id name

1 a

1 b

2 c

2 a

3 c

result:

id name

1 a 

2 a

operate:

delete from a_tmp 

where id in (select * from (select b.id from a_tmp b group by b.id having count(b.id) >1) bb) 

and name not in (select * from (select min(a.name) from a_tmp a GROUP BY a.id having count(a.id) >1) aa);

Notice:

The bold and green words above must be aliased and must use the select * from (……) format, otherwise an error will be reported:

[Err] 1093 - You can't specify target table 'a_tmp' for update in FROM clause

[Advanced] Remove duplicates by repetition of multiple fields

For example, to remove duplicates of the same id and name, that is, to count the same id and name as duplicate rows, and to count the same id but different names as non-duplicate rows

Usage: Similar to a single field, the primary key is generally used for comparison because the value of the primary key must be unique.

id name rowid

1a1

1 a 2

1 b 3

2 b 4

2 b 5

3 c 6

3d7

result:

id name rowid

1a1 

1 b 3

2 b 4

3 c 6

3d7

operate:

The first one:

delete from a_tmp 

where (id,name) in (select * from (select b.id,b.name from a_tmp b group by b.id,b.name having count(b.id) >1) bb) 

and rowid not in (select * from (select min(a.rowid) from a_tmp a group by a.id,a.name having count(a.id) >1) aa);

Second type:

Connect the values ​​of the id and name fields and insert them into the temporary table b_tmp. Then you can use the [Intermediate] single field judgment and deletion method.

#Insert the value of the two fields connected and the field with the unique value in the a_tmp table into the b_tmp table

insert into b_tmp 

 select concat(id,name),rowid from a_tmp;

#Find out the rows that need to be kept select id_name,max(rowid)

 from b_tmp 

 group by id_name

 having count(id_name)>1;

#Use the [Intermediate] method or stored procedure to complete the deduplication work

[Ultimate] Each row has two copies of the same data

For example:

How to use: If the data in the entire row is the same, it is impossible to use SQL statements to delete it, because there is no conditional restriction that can be used to keep one row and delete all the identical rows. There are no different fields that can be created by yourself, that is: add a field, set it to auto-increment, and set it as the primary key, it will automatically add values.

id name

1 a

1 a

1 b

1 b

2 c

2 c

3 c

3 c

result:

id name rowid

1a1

1 b 3

2 c 5

3 c 7

operate:

Add an auto-increment field and temporarily set it as the primary key.

Use the [Intermediate] and [Advanced] methods above.

Thank you for reading, I hope it can help you, thank you for your support of this site!

You may also be interested in:
  • MySQL development skills: JOIN update and data duplication check/deduplication
  • MySQL data duplicate checking and deduplication implementation statements
  • One sql statement completes MySQL deduplication and keeps one
  • A brief discussion on deduplication in SQL database
  • A small example of SQL grouping and sorting to remove duplicates
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • A practical record of how to check and remove duplicate SQL

<<:  CentOS7 deployment Flask (Apache, mod_wsgi, Python36, venv)

>>:  Detailed explanation of Docker+Jenkins+Gitlab+Django application deployment practice

Recommend

How to run MySQL using docker-compose

Directory Structure . │ .env │ docker-compose.yml...

Why does your height:100% not work?

Why doesn't your height:100% work? This knowl...

Detailed explanation of MySQL sql99 syntax inner join and non-equivalent join

#Case: Query employee salary levels SELECT salary...

CentOS 6.4 MySQL 5.7.18 installation and configuration method graphic tutorial

The specific steps of installing mysql5.7.18 unde...

How to install Element UI and use vector graphics in vue3.0

Here we only focus on the installation and use of...

Example of how to create a local user in mysql and grant database permissions

Preface When you install MySQL, you usually creat...

Detailed explanation of the difference between flex and inline-flex in CSS

inline-flex is the same as inline-block. It is a ...

A brief discussion on HTML ordered lists, unordered lists and definition lists

Ordered List XML/HTML CodeCopy content to clipboa...

Detailed explanation of how to use $props, $attrs and $listeners in Vue

Table of contents background 1. Document Descript...

How to implement a multi-terminal bridging platform based on websocket in JS

Table of contents 1. What to debug 2. Features of...

Tutorial on using the frameset tag in HTML

Frameset pages are somewhat different from ordina...

In-depth analysis of the Tomcat server of Centos 7 system

Table of contents 1. The origin of tomcat 1. Tomc...

Summary of Linux nc command

NC's full name is Netcat (Network Knife), and...

MySQL 5.7.19 (tar.gz) installation graphic tutorial under Linux

The first tutorial for installing MySQL-5.7.19 ve...