1. How to find duplicate rowsThe first step is to define what counts as a duplicate row. In most cases this is simple: they have the same value in one column. This article uses this definition. Perhaps your definition of "duplicate" is more complicated than this, and you need to make some modifications to the SQL. The data samples used in this article are: create table test(id int not null primary key, day date not null); insert into test(id, day) values(1, '2006-10-08'); insert into test(id, day) values(2, '2006-10-08'); insert into test(id, day) values(3, '2006-10-09'); select * from test; +----+------------+ | id | day | +----+------------+ | 1 | 2006-10-08 | | 2 | 2006-10-08 | | 3 | 2006-10-09 | +----+------------+ The first two rows have the same value in select day, count(*) from test GROUP BY day; +------------+----------+ | day | count(*) | +------------+----------+ | 2006-10-08 | 2 | | 2006-10-09 | 1 | +------------+----------+
select day, count(*) from test group by day HAVING count(*) > 1; +------------+----------+ | day | count(*) | +------------+----------+ | 2006-10-08 | 2 | +------------+----------+ This is the basic technique: group by a field with the same value, then display groups with a size greater than 1. Why can't I use the WHERE clause? Because the WHERE clause filters the rows before grouping, and the HAVING clause filters the rows after grouping. 2. How to delete duplicate rowsA related question is how to remove duplicate rows. A common task is to keep only one row of duplicates and delete the others. You can then create appropriate indexes to prevent duplicate rows from being written to the database in the future. Again, the first thing to do is to figure out what a duplicate row is. Which line do you want to keep? The first row, or the row with the maximum value for a certain field? In this article, it is assumed that you want to keep the first row - the row with the smallest value in the id field, which means you want to delete the other rows. Perhaps the easiest way to do this is via a temporary table. Especially for Our task is to remove all duplicate rows except the row with the minimum value of create temporary table to_delete (day date not null, min_id int not null); insert into to_delete(day, min_id) select day, MIN(id) from test group by day having count(*) > 1; select * from to_delete; +------------+--------+ | day | min_id | +------------+--------+ | 2006-10-08 | 1 | +------------+--------+ With this data in hand, you can start deleting the "dirty" rows. There are several approaches, each with its own advantages and disadvantages, but I won't compare them in detail here, except to explain the standard approach used in relational databases that support query clauses. delete from test where exists( select * from to_delete where to_delete.day = test.day and to_delete.min_id <> test.id ) 3. How to find duplicates on multiple columnsSomeone recently asked this question: I have two fields b and c in one of my tables, which are respectively associated with the b and c fields of two other tables. I want to find out the rows that have duplicate values on either the b field or the c field. It was hard to understand at first, but after a conversation I understood: he wanted to create create table a_b_c( a int not null primary key auto_increment, b int, c int ); insert into a_b_c(b,c) values (1, 1); insert into a_b_c(b,c) values (1, 2); insert into a_b_c(b,c) values (1, 3); insert into a_b_c(b,c) values (2, 1); insert into a_b_c(b,c) values (2, 2); insert into a_b_c(b,c) values (2, 3); insert into a_b_c(b,c) values (3, 1); insert into a_b_c(b,c) values (3, 2); insert into a_b_c(b,c) values (3, 3); Now, you can easily see that there are some duplicate rows in the table, but no two rows have the same tuple {b, c}. That's why the problem becomes difficult. 4. Incorrect query statementsIf you group both columns together you will get different results depending on how you group and calculate the size. This is exactly where the questioner is stuck. Sometimes a query finds some duplicate rows but misses others. This is the query he used select b, c, count(*) from a_b_c group by b, c having count(distinct b > 1) or count(distinct c > 1); The result returns all rows because select b, c, count(*) from a_b_c group by b, c having count(1) or count(1); Why? Because (b > 1) is a boolean value, which is not what you want at all. What you want is select b, c, count(*) from a_b_c group by b, c having count(distinct b) > 1 or count(distinct c) > 1; Returns empty result. Obviously, since there are no duplicates of select b, count(*) from a_b_c group by b having count(distinct c) > 1; +------+----------+ | b | count(*) | +------+----------+ | 1 | 3 | | 2 | 3 | | 3 | 3 | +------+----------+ None of them can find all the duplicate rows. And the most frustrating thing is that for some situations, this statement is valid, if you mistakenly think that this is the way to write it, but for other situations, you may get the wrong result. In fact, simply using When you sort (group) the b field, the same value of c is divided into different groups, so you cannot 5. Several correct methodsPerhaps the simplest way is to find duplicate rows for each field individually, then combine them using UNION, like this: select b as value, count(*) as cnt, 'b' as what_col from a_b_c group by b having count(*) > 1 union select c as value, count(*) as cnt, 'c' as what_col from a_b_c group by c having count(*) > 1; +-------+-----+----------+ | value | cnt | what_col | +-------+-----+----------+ | 1 | 3 | b | | 2 | 3 | b | | 3 | 3 | b | | 1 | 3 | c | | 2 | 3 | c | | 3 | 3 | c | +-------+-----+----------+ The select a, b, c from a_b_c where b in (select b from a_b_c group by b having count(*) > 1) or c in (select c from a_b_c group by c having count(*) > 1); +----+------+------+ | a | b | c | +----+------+------+ | 7 | 1 | 1 | | 8 | 1 | 2 | | 9 | 1 | 3 | | 10 | 2 | 1 | | 11 | 2 | 2 | | 12 | 2 | 3 | | 13 | 3 | 1 | | 14 | 3 | 2 | | 15 | 3 | 3 | +----+------+------+ This approach is much less efficient than using select a, a_b_c.b, a_b_c.c from a_b_c left outer join ( select b from a_b_c group by b having count(*) > 1 ) as b on a_b_c.b = bb left outer join ( select c from a_b_c group by c having count(*) > 1 ) as c on a_b_c.c = cc where bb is not null or cc is not null The above methods work, and I'm sure there are others. If This is the end of this article about how to find and delete duplicate rows in MySQL. For more information about how to find and delete duplicate rows in MySQL, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: A Deeper Look at the Differences Between Link and @import
>>: Difference between var and let in JavaScript
Object.defineProperty Understanding grammar: Obje...
Special note: Only the Swoole extension is instal...
Table of contents 1. Monitoring port Relationship...
The loading speed of a web page is an important in...
MySQL 8.0.18 stable version (GA) was officially r...
The specific code for JavaScript to implement the...
By default, the width and height of the header ar...
There are two main reasons why it is difficult to...
The solutions to the problems encountered during x...
When designing H5 layout, you will usually encoun...
Table of contents 1. Constructors and instances 2...
Table of contents 502 bad gateway error formation...
Table of contents Preface 1. Define label style 2...
Steps to configure whitelist access in mysql 1. L...
Table of contents 1. parse 1.1 Rules for intercep...