Should I use distinct or group by to remove duplicates in MySQL?

Should I use distinct or group by to remove duplicates in MySQL? Performance ratio Small quantity, few types Small quantity, many varieties Large number of categoriesNo indexingSlightly betterDistinct is betterWith indexingSlightly differentSlightly differentSlightly differentSlightly differentSlightly different

In the deduplication scenario, when no index is added, distinct is more likely to be used, but when index is added, both distinct and group by can be used.

Summarize

This is the article about whether to use distinct or group by for MySQL deduplication? This is the end of the article. For more information about mysql deduplication distinct group by, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion on MySQL select optimization solution
  • MySQL select results to perform update example tutorial
  • Solve the problem that MySQL read-write separation causes data not to be selected after insert
  • How MySQL Select Statement is Executed
  • Detailed example of using the distinct method in MySQL
  • The difference between distinct and group by in MySQL
  • Let's talk about the LIMIT statement in MySQL in detail
  • MySQL series tutorial on understanding the use of union (all) and limit and exists keywords
  • The impact of limit on query performance in MySQL
  • Use of select, distinct, and limit in MySQL

Preface

About the performance comparison between group by and distinct: the conclusion on the Internet is as follows: distinct has better performance with a small amount of data without index, and group by has better performance with a large amount of data. Group by with index has better performance. When going through the index, the fewer the grouping types, the faster distinct is. Verify the conclusions drawn online.

Disable query cache during the prepare phase

Check whether query cache is set in MySQL. In order not to affect the test results, you need to turn off the query cache.

show variables like '%query_cache%'; 

insert image description here

Check whether query cache is enabled or not, which is determined by query_cache_type and query_cache_size .

  • Method 1: To turn off query cache, you need to find my.ini and modify query_cache_type You need to modify the C:\ProgramData\MySQL\MySQL Server 5.7\my.ini configuration file and modify query_cache_type=0或2 .
  • Method 2: Set query_cache_size to 0 and execute the following statement.
set global query_cache_size = 0;

Method 3: If you don’t want to turn off the query cache, you can also use RESET QUERY CACHE .

In the current test environment, query_cache_type=2 means query caching on demand. The default query mode is not to cache. If caching is required, you need to add sql_cache to the query statement.

Data preparation

Table t0 stores 100,000少量種類少

drop table if exists t0;
create table t0(
id bigint primary key auto_increment,
a varchar(255) not null
) engine=InnoDB default charset=utf8mb4 collate=utf8mb4_bin;
1
2
3
4
5
drop procedure insert_t0_simple_category_data_sp;
delimiter //
create procedure insert_t0_simple_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t0(a) value(truncate(@i/1000, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t0_simple_category_data_sp(100000);

Table t1 stores 10,000少量種類多

drop table if exists t1;
create table t1 like t0;
1
2
drop procedure insert_t1_complex_category_data_sp;
delimiter //
create procedure insert_t1_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t1_complex_category_data_sp(10000);

The t2 table stores 5 million大量種類多

drop table if exists t2;
create table t2 like t1;
1
2
drop procedure insert_t2_complex_category_data_sp;
delimiter //
create procedure insert_t2_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t2_complex_category_data_sp(5000000);

Testing Phase

Verify a small amount of data

Not indexed

set profiling = 1;
select distinct a from t0;
show profiles;
select a from t0 group by a;
show profiles;
alter table t0 add index `a_t0_index`(a); 

insert image description here

This shows that when there is a small number of types and little data, without indexing, the performance of distinct and group by is almost the same.

Add index

alter table t0 add index `a_t0_index`(a);

After executing a query similar to the above

insert image description here

This shows that with a small number of types and little data, the performance of distinct and group by are almost the same when adding indexes.

Verify that a small amount of data with many types is not indexed

After executing a similar unindexed query as above

insert image description here

It can be seen from this that when there is a small amount of data with many types and no index, the performance of distinct is slightly higher than that of group by, but the difference is not large.

Add index

alter table t1 add index `a_t1_index`(a);

After executing a similar unindexed query

insert image description here

It can be seen from this that with a small amount of data and a lot of types, the performance of distinct and group by are almost the same when adding indexes.

Verify large amounts of data

Not indexed

SELECT count(1) FROM t2; 

insert image description here

After executing a similar unindexed query as above

insert image description here

This shows that when there is a large amount of data of many types and without indexing, DISTINCT performs better than GROUP BY.

Add index

alter table t2 add index `a_t2_index`(a);

After executing the above similar index query

insert image description here

This shows that with a large amount of data of many types, the performance of distinct and group by are almost the same when adding indexes.

Summarize

<<:  Example of Vue uploading files using formData format type

>>:  How to deploy Tencent Cloud Server from scratch

Recommend

js implements a simple shopping cart module

This article example shares the specific code of ...

MySQL Index Detailed Explanation

Table of contents 1. Index Basics 1.1 Introductio...

HTML Tutorial: Collection of commonly used HTML tags (4)

These introduced HTML tags do not necessarily ful...

How to pull the docker image to view the version

To view the version and tag of the image, you nee...

Vue's global watermark implementation example

Table of contents 1. Create a watermark Js file 2...

react-beautiful-dnd implements component drag and drop function

Table of contents 1. Installation 2.APi 3. react-...

Example of creating circular scrolling progress bar animation using CSS3

theme Today I will teach you how to create a circ...

How to make if judgment in js as smooth as silk

Table of contents Preface Code Implementation Ide...

Analysis of mysql view functions and usage examples

This article uses examples to illustrate the func...

Solution to forgetting the administrator password of mysql database

1. Enter the command mysqld --skip-grant-tables (...

How to implement interception of URI in nginx location

illustrate: Root and alias in location The root d...

Detailed explanation of nginx forward proxy and reverse proxy

Table of contents Forward Proxy nginx reverse pro...

js realizes the function of clicking to switch cards

This article example shares the specific code of ...