Summary of SQL deduplication methods

Summary of SQL deduplication methods

When using SQL to extract data, we often encounter duplicate values ​​in the table. For example, if we want to get UV (unique visitors), we need to deduplicate.

In MySQL, distinct or group by clause is usually used, but in SQLs that support window functions (such as Hive SQL , Oracle , etc.), the ROW_NUMBER window function can also be used for deduplication.

For example, there is a table task like this:

Remark:

  • task_id : task id;
  • order_id : order id;
  • start_time : start time

Note : One task corresponds to multiple orders

We need to find the total number of tasks. Since task_id is not unique, we need to remove duplicates:

distinct

 -- List all unique values ​​of task_id (after deduplication)

select distinct task_id
from Task;

--Total number of tasks select count(distinct task_id) task_num
from Task;


distinct is usually less efficient. It is not suitable for displaying specific values ​​after deduplication, and is generally used together with count to calculate the number of entries.
When distinct is used, it is placed after select to deduplicate the values ​​of all the subsequent fields. For example, if there are two fields after distinct , then the two records 1,1 and 1,2 are not duplicate values.

group by

 -- List all unique values ​​of task_id (after deduplication, null is also a value)
-- select task_id
-- from Task
-- group by task_id;

--Total number of tasks select count(task_id) task_num
from (select task_id
   from Task
   group by task_id) tmp;

row_number

row_number is a window function with the following syntax:

row_number() over (partition by <用于分組的字段名> order by <用于組內排序的字段名>)
partition by part can be omitted.

 -- Use select count(case when rn=1 then task_id else null end) task_num in SQL that supports window functions
from (select task_id
    , row_number() over (partition by task_id order by start_time) rn
  from Task) tmp;

In addition, let's use a table test to explain the use of distinct and group by in deduplication:

 -- The semicolon below is used to separate rows select distinct user_id
from Test; -- returns 1; 2

select distinct user_id, user_type
from Test; -- returns 1, 1; 1, 2; 2, 1

select user_id
from Test
group by user_id; -- returns 1; 2

select user_id, user_type
from Test
group by user_id, user_type; -- returns 1, 1; 1, 2; 2, 1

select user_id, user_type
from Test
group by user_id; 
  -- Hive, Oracle, etc. will report an error, but MySQL can be written like this.
-- Returns 1, 1 or 1, 2; 2, 1 (two rows in total). Only the fields after group by will be deduplicated, which means the number of records returned at the end is equal to the number of records in the previous SQL statement, that is, 2 records. For fields that are not placed after group by but are placed in select, only one record will be returned (usually the first one, but there should be no pattern).

This is the end of this article on the summary of SQL deduplication methods. For more relevant SQL deduplication methods, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Summary of three deduplication methods in SQL
  • Detailed example of using the distinct method in MySQL
  • How to optimize MySQL deduplication operation to the extreme
  • A simple method to merge and remove duplicate MySQL tables
  • MySQL deduplication methods
  • Detailed explanation of two methods of deduplication in MySQL and example code
  • SQL Learning Notes 5: How to remove duplicates and assign values ​​to newly added fields

<<:  Share 10 of the latest web front-end frameworks (translation)

>>:  Pure CSS to achieve hover image pop-out pop-up effect example code

Recommend

Detailed explanation of the top ten commonly used string functions in MySQL

Hello everyone! I am Mr. Tony who only talks abou...

In-depth understanding of javascript class array

js array is probably familiar to everyone, becaus...

Vue implements form data validation example code

Add rules to the el-form form: Define rules in da...

PHP related paths and modification methods in Ubuntu environment

PHP related paths in Ubuntu environment PHP path ...

Study notes to write the first program of Vue

Table of contents 1. Write an HTML, the first Vue...

Docker container operation instructions summary and detailed explanation

1. Create and run a container docker run -it --rm...

React tips teach you how to get rid of hooks dependency troubles

A very common scenario in react projects: const [...

Implementation steps for installing RocketMQ in docker

Table of contents 1. Retrieve the image 2. Create...

CSS3 border effects

What is CSS# CSS (abbreviation of Cascading Style...

JavaScript style object and CurrentStyle object case study

1. Style object The style object represents a sin...

HTML uses marquee to achieve text scrolling left and right

Copy code The code is as follows: <BODY> //...

Sample code using scss in uni-app

Pitfalls encountered I spent the whole afternoon ...

Pure HTML+CSS to achieve Element loading effect

This is the effect of the Element UI loading comp...