The difference and usage of distinct and row_number() over() in SQL

The difference and usage of distinct and row_number() over() in SQL

1 Introduction

When we write SQL statements to operate data in the database, we may encounter some unpleasant problems. For example, for records with the same name in the same field, we only need to display one, but in fact the database may contain multiple records with the same name. Therefore, multiple records will be displayed during retrieval, which goes against our original intention! Therefore, in order to avoid this situation, we need to perform "deduplicate" processing. So what is "deduplicate"? To put it simply, it means that for the same field, only one record with the same content is displayed.

So, how to achieve the "de-duplication" function? In this regard, we have two ways to achieve this function.

The first method is to add the distinct keyword when writing the select statement;

The second method is to call the row_number() over() function when writing the select statement.

Both of the above methods can achieve the "deduplication" function, so what are the similarities and differences between them? Next, the author will give a detailed explanation.

2 distinct

In SQL, the keyword distinct is used to return uniquely distinct values. Its syntax format is:

SELECT DISTINCT column_name FROM table_name

Assume there is a table "CESHIDEMO" containing two fields, NAME and AGE, with the following format:

CESHIDEMO

Observing the above table, we will find that there are two records with the same NAME and three records with the same AGE. If we run the following SQL statement,

/**
* Where PPPRDER is the name of the schema, that is, the table CESHIDEMO is in PPPRDER*/

select name from PPPRDER.CESHIDEMO

You will get the following result:

name

Observing the result, we will find that among the above four records, there are two records with the same NAME value, that is, the values ​​of the second and third records are both "gavin". So, if we want to display only one record with the same NAME, how can we achieve it? At this time, you need to use the distinct keyword! Next, run the following SQL statement,

select distinct name from PPPRDER.CESHIDEMO

You will get the following result:

distinct

Observing the result, it is obvious that our requirements have been achieved! However, we can't help but wonder, what effect will it have if the distinct keyword is applied to two fields at the same time? Now that we have thought of it, let's try it and run the following SQL statement:

select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

nameandage

Observe the result. Oops, it seems to have no effect? She showed all the records! There are two records with the same NAME value and three records with the same AGE value. There is no change at all! But in fact, the result should be like this. Because when distinct is used on multiple fields, it will only remove the records with the same field values . Obviously, our four "poor" records do not meet this condition, so distinct will think that the above four records are not the same. Words are not enough, so let's add an identical record to the "CESHIDEMO" table to verify it. The table after adding a record looks like this:

添加一條記錄

Then run the following SQL statement,

select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

nameandage

Observing the results, we can perfectly verify our conclusion above.

In addition, there is one point that you need to pay special attention to, that is : the keyword distinct can only be put at the front of all fields in the SQL statement to work. If it is put in the wrong place, SQL will not report an error, but it will not have any effect.

3 row_number() over()

In the SQL Server database, we are provided with a function row_number() for numbering records in the database table. When used, it is followed by a function over(), and the function over() is used to group and sort the records in the table. The syntax used for both is:

ROW_NUMBER() OVER(PARTITION BY COLUMN1 ORDER BY COLUMN2)

Meaning: Group the records in the table by field COLUMN1 and sort them by field COLUMN2, where

PARTITION BY: indicates grouping ORDER BY: indicates sorting

Next, we also test with the data in the table "CESHIDEMO". First, the query results without using the row_number() over() function are shown below:

添加一條記錄

Then, run the following SQL statement,

select PPPRDER.CESHIDEMO.*, row_number() over(partition by age order by name desc) from PPPRDER.CESHIDEMO

The results are as follows:

函數

From the above results, we can see that based on the original table, there is an additional column marked with numerical sorting. Then, if we analyze the SQL statement we run in reverse, we will find that it is indeed grouped by the value of the field AGE and sorted by the value of the field NAME! Therefore, the functionality of the function is verified.

Next, let’s study how to use the row_number() over() function to achieve the “de-duplication” function. By observing the above results, we can find that if we group by NAME, sort by AGE, and then take the first record in each group, we may be able to achieve the "de-duplicate" function! Then try it and run the following SQL statement,

/*
* rn indicates the last column added*/

select * from 
(select PPPRDER.CESHIDEMO.*, row_number() over(partition by name order by age desc) rn from PPPRDER.CESHIDEMO)
where rn = 1

After running, the results are as follows:

rn

Observing the above results, we find that, oops, we have achieved the function of "deduplicate" data accidentally! Unfortunately, if we are careful, we will find something unpleasant. That is, when executing the above SQL statement for "de-duplication", a record with a NAME value of "gavin" and an AGE value of "18" is filtered out. However, in real life, it is very common to have the same name but different ages.

4 Conclusion

By reading and practicing the above content, we already know that whether using the keyword distinct or the function row_number() over() can achieve the function of "deduplicate" data. However, in the process of implementation, we must pay special attention to the usage characteristics and differences between the two.

When using the keyword distinct, we need to know that there is a difference between when it works on a single field and when it works on multiple fields. When it works on a single field, it "deduplicates" all data with duplicate values ​​in the field in the table; when it works on multiple fields, it "deduplicates" data with the same values ​​in all fields in the table (that is, the multiple fields that distinct works on).

When using the row_number() over() function, it first groups and sorts the records, then takes out the first record of each group to perform "de-duplication" (as in this blog post). Of course, we can also use different restrictions to perform "deduplicate" here. How to implement it specifically requires everyone to think about it!

Finally, in this blog post, the author elaborates on his understanding of using the keyword distinct and the function row_number() over() to "deduplicate" data. I hope the above content can be helpful to everyone!

Thank you for reading, I hope it can help you, thank you for your support of this site!

<<:  Detailed explanation of asynchronous generators and asynchronous iterations in Node.js

>>:  Friendly Alternatives to Find Tool in Linux

Recommend

How to deploy MySQL and Redis services using Docker

Table of contents How to deploy MySQL service usi...

Summary of CSS front-end knowledge points (must read)

1. The concept of css: (Cascading Style Sheet) Ad...

Detailed tutorial on installing Docker on CentOS 8.4

Table of contents Preface: System Requirements: I...

The ultimate solution for writing bash scripts with nodejs

Table of contents Preface zx library $`command` c...

5 solutions to CSS box collapse

First, what is box collapse? Elements that should...

Linux uses join -a1 to merge two files

To merge the following two files, merge them toge...

The latest version of MySQL5.7.19 decompression version installation guide

MySQL version: MySQL Community Edition (GPL) ----...

Detailed examples of variable and function promotion in JavaScript

js execution Lexical analysis phase: includes thr...

js to achieve simple drag effect

This article shares the specific code of js to ac...

Should I abandon JQuery?

Table of contents Preface What to use if not jQue...

Example code for using text-align and margin: 0 auto to center in CSS

Use text-align, margin: 0 auto to center in CSS W...

Causes and solutions to the garbled character set problem in MySQL database

Preface Sometimes when we view database data, we ...

JavaScript anti-shake case study

principle The principle of anti-shake is: you can...