Should I use distinct or group by to remove duplicates in MySQL?

Should I use distinct or group by to remove duplicates in MySQL? Performance ratio Small quantity, few types Small quantity, many varieties Large number of categoriesNo indexingSlightly betterDistinct is betterWith indexingSlightly differentSlightly differentSlightly differentSlightly differentSlightly different

In the deduplication scenario, when no index is added, distinct is more likely to be used, but when index is added, both distinct and group by can be used.

Summarize

This is the article about whether to use distinct or group by for MySQL deduplication? This is the end of the article. For more information about mysql deduplication distinct group by, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief discussion on MySQL select optimization solution
  • MySQL select results to perform update example tutorial
  • Solve the problem that MySQL read-write separation causes data not to be selected after insert
  • How MySQL Select Statement is Executed
  • Detailed example of using the distinct method in MySQL
  • The difference between distinct and group by in MySQL
  • Let's talk about the LIMIT statement in MySQL in detail
  • MySQL series tutorial on understanding the use of union (all) and limit and exists keywords
  • The impact of limit on query performance in MySQL
  • Use of select, distinct, and limit in MySQL

Preface

About the performance comparison between group by and distinct: the conclusion on the Internet is as follows: distinct has better performance with a small amount of data without index, and group by has better performance with a large amount of data. Group by with index has better performance. When going through the index, the fewer the grouping types, the faster distinct is. Verify the conclusions drawn online.

Disable query cache during the prepare phase

Check whether query cache is set in MySQL. In order not to affect the test results, you need to turn off the query cache.

show variables like '%query_cache%'; 

insert image description here

Check whether query cache is enabled or not, which is determined by query_cache_type and query_cache_size .

  • Method 1: To turn off query cache, you need to find my.ini and modify query_cache_type You need to modify the C:\ProgramData\MySQL\MySQL Server 5.7\my.ini configuration file and modify query_cache_type=0或2 .
  • Method 2: Set query_cache_size to 0 and execute the following statement.
set global query_cache_size = 0;

Method 3: If you don’t want to turn off the query cache, you can also use RESET QUERY CACHE .

In the current test environment, query_cache_type=2 means query caching on demand. The default query mode is not to cache. If caching is required, you need to add sql_cache to the query statement.

Data preparation

Table t0 stores 100,000少量種類少

drop table if exists t0;
create table t0(
id bigint primary key auto_increment,
a varchar(255) not null
) engine=InnoDB default charset=utf8mb4 collate=utf8mb4_bin;
1
2
3
4
5
drop procedure insert_t0_simple_category_data_sp;
delimiter //
create procedure insert_t0_simple_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t0(a) value(truncate(@i/1000, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t0_simple_category_data_sp(100000);

Table t1 stores 10,000少量種類多

drop table if exists t1;
create table t1 like t0;
1
2
drop procedure insert_t1_complex_category_data_sp;
delimiter //
create procedure insert_t1_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t1_complex_category_data_sp(10000);

The t2 table stores 5 million大量種類多

drop table if exists t2;
create table t2 like t1;
1
2
drop procedure insert_t2_complex_category_data_sp;
delimiter //
create procedure insert_t2_complex_category_data_sp(IN num int)
begin
set @i = 0;
while @i < num do
	insert into t1(a) value(truncate(@i/10, 0));
 set @i = @i + 1;
end while;
end
//
call insert_t2_complex_category_data_sp(5000000);

Testing Phase

Verify a small amount of data

Not indexed

set profiling = 1;
select distinct a from t0;
show profiles;
select a from t0 group by a;
show profiles;
alter table t0 add index `a_t0_index`(a); 

insert image description here

This shows that when there is a small number of types and little data, without indexing, the performance of distinct and group by is almost the same.

Add index

alter table t0 add index `a_t0_index`(a);

After executing a query similar to the above

insert image description here

This shows that with a small number of types and little data, the performance of distinct and group by are almost the same when adding indexes.

Verify that a small amount of data with many types is not indexed

After executing a similar unindexed query as above

insert image description here

It can be seen from this that when there is a small amount of data with many types and no index, the performance of distinct is slightly higher than that of group by, but the difference is not large.

Add index

alter table t1 add index `a_t1_index`(a);

After executing a similar unindexed query

insert image description here

It can be seen from this that with a small amount of data and a lot of types, the performance of distinct and group by are almost the same when adding indexes.

Verify large amounts of data

Not indexed

SELECT count(1) FROM t2; 

insert image description here

After executing a similar unindexed query as above

insert image description here

This shows that when there is a large amount of data of many types and without indexing, DISTINCT performs better than GROUP BY.

Add index

alter table t2 add index `a_t2_index`(a);

After executing the above similar index query

insert image description here

This shows that with a large amount of data of many types, the performance of distinct and group by are almost the same when adding indexes.

Summarize

<<:  Example of Vue uploading files using formData format type

>>:  How to deploy Tencent Cloud Server from scratch

Recommend

Usage of Node.js http module

Table of contents Preface HTTP HTTP Server File S...

Creating a Secondary Menu Using JavaScript

This article example shares the specific code of ...

CentOS7 installation GUI interface and remote connection implementation

Use the browser (webdriver)-based selenium techno...

Solution to ONLY_FULL_GROUP_BY error in Mysql5.7 and above

Recently, during the development process, the MyS...

What to do if you forget your mysql password

Forgot your MySQL password twice? At first I did ...

Detailed code for implementing 3D tag cloud in Vue

Preview: Code: Page Sections: <template> &l...

Summary of commonly used performance test scripts for VPS servers

Here is a common one-click performance test scrip...

How to install and configure the supervisor daemon under centos7

Newbie, record it yourself 1. Install supervisor....

How to mount a new disk on a Linux cloud server

background A new server was added in the company,...

A detailed analysis of the murder caused by a misplaced double quote in MySQL

1. Introduction Recently, I often encounter devel...

Use Angular CDK to implement a Service pop-up Toast component function

Table of contents 1. Environmental Installation 2...

How to use JavaScript to get the most repeated characters in a string

Table of contents topic analyze Objects of use So...

Detailed explanation of the update command for software (library) under Linux

When installing packages on an Ubuntu server, you...