Summary of SQL deduplication methods

When using SQL to extract data, we often encounter duplicate values in the table. For example, if we want to get UV (unique visitors), we need to deduplicate.

In MySQL, distinct or group by clause is usually used, but in SQLs that support window functions (such as Hive SQL , Oracle , etc.), the ROW_NUMBER window function can also be used for deduplication.

For example, there is a table task like this:

Remark:

task_id : task id;
order_id : order id;
start_time : start time

Note : One task corresponds to multiple orders

We need to find the total number of tasks. Since task_id is not unique, we need to remove duplicates:

distinct

 -- List all unique values of task_id (after deduplication)

select distinct task_id
from Task;

--Total number of tasks select count(distinct task_id) task_num
from Task;

distinct is usually less efficient. It is not suitable for displaying specific values after deduplication, and is generally used together with count to calculate the number of entries.
When distinct is used, it is placed after select to deduplicate the values of all the subsequent fields. For example, if there are two fields after distinct , then the two records 1,1 and 1,2 are not duplicate values.

group by

 -- List all unique values of task_id (after deduplication, null is also a value)
-- select task_id
-- from Task
-- group by task_id;

--Total number of tasks select count(task_id) task_num
from (select task_id
   from Task
   group by task_id) tmp;

row_number

row_number is a window function with the following syntax:

row_number() over (partition by <用于分組的字段名> order by <用于組內排序的字段名>)
partition by part can be omitted.

 -- Use select count(case when rn=1 then task_id else null end) task_num in SQL that supports window functions
from (select task_id
    , row_number() over (partition by task_id order by start_time) rn
  from Task) tmp;

In addition, let's use a table test to explain the use of distinct and group by in deduplication:

 -- The semicolon below is used to separate rows select distinct user_id
from Test; -- returns 1; 2

select distinct user_id, user_type
from Test; -- returns 1, 1; 1, 2; 2, 1

select user_id
from Test
group by user_id; -- returns 1; 2

select user_id, user_type
from Test
group by user_id, user_type; -- returns 1, 1; 1, 2; 2, 1

select user_id, user_type
from Test
group by user_id; 
  -- Hive, Oracle, etc. will report an error, but MySQL can be written like this.
-- Returns 1, 1 or 1, 2; 2, 1 (two rows in total). Only the fields after group by will be deduplicated, which means the number of records returned at the end is equal to the number of records in the previous SQL statement, that is, 2 records. For fields that are not placed after group by but are placed in select, only one record will be returned (usually the first one, but there should be no pattern).

This is the end of this article on the summary of SQL deduplication methods. For more relevant SQL deduplication methods, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

Summary of three deduplication methods in SQL
Detailed example of using the distinct method in MySQL
How to optimize MySQL deduplication operation to the extreme
A simple method to merge and remove duplicate MySQL tables
MySQL deduplication methods
Detailed explanation of two methods of deduplication in MySQL and example code
SQL Learning Notes 5: How to remove duplicates and assign values to newly added fields

<<: Share 10 of the latest web front-end frameworks (translation)

>>: Pure CSS to achieve hover image pop-out pop-up effect example code

CSS float (float, clear) popular explanation and experience sharing

Summary of SQL deduplication methods

CSS float (float, clear) popular explanation and experience sharing

Detailed explanation of Vue's caching method example

Web design reference firefox default style

How to configure nginx+php+mysql in docker

CSS3 transition rotation perspective 2d3d animation and other effects example code

Docker container source code deployment httpd use storage volume to deploy the website (recommended)

Teach you how to build hive3.1.2 on Tencent Cloud

Solve the abnormal error when building vue environment with webpack

Implementation process of nginx high availability cluster

How to install phabricator using Docker

Recommend

MySQL 8.0.12 decompression version installation graphic tutorial under Windows 10

Solution to slow network request in docker container

In the interview, I was asked whether select...for update would lock the table or the row.

Introduction to the usage of exists and except in SQL Server

How to manually scroll logs in Linux system

Analysis of slow insert cases caused by large transactions in MySQL

Solve MySQL deadlock routine by updating different indexes

MySQL 8.0.12 installation and configuration method graphic tutorial

How to use iostat to view Linux hard disk IO performance

Using JS to implement a small game of aircraft war

Docker realizes the connection with the same IP network segment

MySQL series 9 MySQL query cache and index

A brief discussion on Flex layout and scaling calculation

Detailed steps to install Hadoop cluster under Linux

Detailed tutorial for installing MySQL on Linux