One sql statement completes MySQL deduplication and keeps one

One sql statement completes MySQL deduplication and keeps one

A few days ago, when I was working on a requirement, I needed to clean up duplicate records in MySQL. At that time, I thought of writing it out through code traversal, but then I felt it was too complicated, and I thought that the problem should be solved through a SQL statement. After checking the information and consulting the experts, I came up with a very convenient SQL statement. Here I share this SQL statement and ideas.

Demand Analysis

There are duplicate records in the database, delete one and keep one (the basis for determining whether it is a duplicate is multiple fields)

Solution

When I encountered this requirement, I probably had some ideas in mind. The first thing that comes to my mind is that it can be solved with a SQL statement, but I am not very experienced in complex SQL statements, so I want to ask an expert for help.

Find someone to help

Because this requirement is a bit urgent, the first thing I thought of was to find a colleague in this field to solve it, and then share this problem with my colleague. As a result, this guy just Baidu and gave me a SQL statement that I had never used before, and asked me to try it myself. A lot of things rushed through my heart...

Baidu

Found a sql statement:

DELETE
FROM
 vitamin a
WHERE
 (a.peopleId, a.seq) IN (
  SELECT
   peopleId,
   seq
  FROM
   vitae
  GROUP BY
   peopleId,
   seq
  HAVING
   count(*) > 1
 )
AND rowid NOT IN (
 SELECT
  min(rowid)
 FROM
  vitae
 GROUP BY
  peopleId,
  seq
 HAVING
  count(*) > 1
)

This statement is found in the article [Delete duplicate data in MySQL and keep only one]. The idea of ​​this SQL statement is very clear, and there are three steps:

SELECT peopleId, seq FROM vitae GROUP BY peopleId, seq HAVING count(*) > 1 to find duplicate records in the table as a condition

SELECT min(rowid) FROM vitae GROUP BY peopleId, seq HAVING count(*) > 1 Query the smallest ID value in the duplicate records in the table as the second condition

Finally, according to the above two conditions, delete the remaining duplicate records except the smallest ID in the duplicate records

But unfortunately, an error occurred when running this statement. The general meaning of the error is that the table cannot be updated at the same time as the query.

Code Solution

Based on the above SQL statement, it is possible to achieve the same purpose in two steps through code:

Remove the duplicate data sets first

According to the queried data set, loop to delete the remaining duplicate data

I had the idea and wrote it quickly, but I was shocked when I ran it. It took about 116 seconds . Then I thought I must find a SQL statement that can be used. I posted the code and the running results:

Perfect [Remove Duplicates and Keep One] SQL

Finally, I got the perfect answer in a technical group. Look at this SQL statement:

DELETE consum_record
FROM
 consum_record, 
 (
  SELECT
   min(id) id,
   user_id,
   monetary,
   consume_time
  FROM
   consum_record
  GROUP BY
   user_id,
   monetary,
   consume_time
  HAVING
   count(*) > 1
 ) t2
WHERE
 consum_record.user_id = t2.user_id 
 and consum_record.monetary = t2.monetary
 and consum_record.consume_time = t2.consume_time
AND consum_record.id > t2.id;

If you look carefully at the above SQL statement, it is not difficult to figure out the idea. It can be understood in three steps:

(SELECT min(id) id, user_id, monetary, consume_time FROM consum_record GROUP BY user_id, monetary, consume_time HAVING count(*) > 1 ) t2 (temporary table t2), which contains the minimum ID of each duplicate record

consum_record.user_id = t2.user_id and consum_record.monetary = t2.monetary and consum_record.consume_time = t2.consume_time are the fields that are used to determine the duplication basis.

According to the conditions, delete the records in the original table whose id is greater than the id in t2

When I saw this sentence, I thought it was amazing. Such a simple SQL statement can actually solve such a complex problem. It's really interesting!

It also runs super fast. The original code loop execution takes about 116 seconds , but here it only takes 0.3 seconds . Amazing~

Summarize

As a PHP programmer, it stands to reason that SQL should not lag behind. However, in reality, there are too many things to do and my current SQL level is only at an average level. I will find an opportunity to improve my knowledge in this area in the future.

That’s all for today.

You may also be interested in:
  • MySQL development skills: JOIN update and data duplication check/deduplication
  • MySQL data duplicate checking and deduplication implementation statements
  • MySQL deduplication methods
  • A brief discussion on deduplication in SQL database
  • A small example of SQL grouping and sorting to remove duplicates
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • A practical record of how to check and remove duplicate SQL

<<:  js implements simple provincial, municipal and district three-level selection cascade

>>:  Detailed analysis of when tomcat writes back the response datagram

Recommend

Detailed explanation of MySQL injection without knowing the column name

Preface I feel like my mind is empty lately, as I...

How to clear default styles and set common styles in CSS

CSS Clear Default Styles The usual clear default ...

Using HTML to implement a voting website cheating scheme that restricts IP

This is a cheating scheme for voting websites wit...

How to configure anti-hotlinking for nginx website service (recommended)

1. Principle of Hotlinking 1.1 Web page preparati...

How to use Nexus to add jar packages to private servers

Why do we need to build a nexus private server? T...

HTML form component example code

HTML forms are used to collect different types of...

Detailed explanation of the mechanism and implementation of accept lock in Nginx

Preface nginx uses a multi-process model. When a ...

Detailed steps for installing Tomcat, MySQL and Redis with Docker

Table of contents Install Tomcat with Docker Use ...

Detailed explanation of the pitfalls of nginx proxy socket.io service

Table of contents Nginx proxies two socket.io ser...

Solution to no Chinese input method in Ubuntu

There is no solution for Chinese input method und...

Summary of MySQL 8.0 Online DDL Quick Column Addition

Table of contents Problem Description Historical ...

React implements dynamic pop-up window component

When we write some UI components, if we don't...

Alibaba Cloud applies for a free SSL certificate (https) from Cloud Shield

Because the project needs to use https service, I...

Example of Html shielding right-click menu and left-click typing function

Disable right-click menu <body oncontextmenu=s...

Linux server SSH cracking prevention method (recommended)

1. The Linux server configures /etc/hosts.deny to...