1 Introduction When we write SQL statements to operate data in the database, we may encounter some unpleasant problems. For example, for records with the same name in the same field, we only need to display one, but in fact the database may contain multiple records with the same name. Therefore, multiple records will be displayed during retrieval, which goes against our original intention! Therefore, in order to avoid this situation, we need to perform "deduplicate" processing. So what is "deduplicate"? To put it simply, it means that for the same field, only one record with the same content is displayed. So, how to achieve the "de-duplication" function? In this regard, we have two ways to achieve this function. The first method is to add the distinct keyword when writing the select statement; The second method is to call the row_number() over() function when writing the select statement. Both of the above methods can achieve the "deduplication" function, so what are the similarities and differences between them? Next, the author will give a detailed explanation. 2 distinct In SQL, the keyword distinct is used to return uniquely distinct values. Its syntax format is: SELECT DISTINCT column_name FROM table_name Assume there is a table "CESHIDEMO" containing two fields, NAME and AGE, with the following format: Observing the above table, we will find that there are two records with the same NAME and three records with the same AGE. If we run the following SQL statement, /** * Where PPPRDER is the name of the schema, that is, the table CESHIDEMO is in PPPRDER*/ select name from PPPRDER.CESHIDEMO You will get the following result: Observing the result, we will find that among the above four records, there are two records with the same NAME value, that is, the values of the second and third records are both "gavin". So, if we want to display only one record with the same NAME, how can we achieve it? At this time, you need to use the distinct keyword! Next, run the following SQL statement, select distinct name from PPPRDER.CESHIDEMO You will get the following result: Observing the result, it is obvious that our requirements have been achieved! However, we can't help but wonder, what effect will it have if the distinct keyword is applied to two fields at the same time? Now that we have thought of it, let's try it and run the following SQL statement: select distinct name, age from PPPRDER.CESHIDEMO The results are as follows: Observe the result. Oops, it seems to have no effect? She showed all the records! There are two records with the same NAME value and three records with the same AGE value. There is no change at all! But in fact, the result should be like this. Because when distinct is used on multiple fields, it will only remove the records with the same field values . Obviously, our four "poor" records do not meet this condition, so distinct will think that the above four records are not the same. Words are not enough, so let's add an identical record to the "CESHIDEMO" table to verify it. The table after adding a record looks like this: Then run the following SQL statement, select distinct name, age from PPPRDER.CESHIDEMO The results are as follows: Observing the results, we can perfectly verify our conclusion above. In addition, there is one point that you need to pay special attention to, that is : the keyword distinct can only be put at the front of all fields in the SQL statement to work. If it is put in the wrong place, SQL will not report an error, but it will not have any effect. 3 row_number() over() In the SQL Server database, we are provided with a function row_number() for numbering records in the database table. When used, it is followed by a function over(), and the function over() is used to group and sort the records in the table. The syntax used for both is: ROW_NUMBER() OVER(PARTITION BY COLUMN1 ORDER BY COLUMN2) Meaning: Group the records in the table by field COLUMN1 and sort them by field COLUMN2, where PARTITION BY: indicates grouping ORDER BY: indicates sorting Next, we also test with the data in the table "CESHIDEMO". First, the query results without using the row_number() over() function are shown below: Then, run the following SQL statement, select PPPRDER.CESHIDEMO.*, row_number() over(partition by age order by name desc) from PPPRDER.CESHIDEMO The results are as follows: From the above results, we can see that based on the original table, there is an additional column marked with numerical sorting. Then, if we analyze the SQL statement we run in reverse, we will find that it is indeed grouped by the value of the field AGE and sorted by the value of the field NAME! Therefore, the functionality of the function is verified. Next, let’s study how to use the row_number() over() function to achieve the “de-duplication” function. By observing the above results, we can find that if we group by NAME, sort by AGE, and then take the first record in each group, we may be able to achieve the "de-duplicate" function! Then try it and run the following SQL statement, /* * rn indicates the last column added*/ select * from (select PPPRDER.CESHIDEMO.*, row_number() over(partition by name order by age desc) rn from PPPRDER.CESHIDEMO) where rn = 1 After running, the results are as follows: Observing the above results, we find that, oops, we have achieved the function of "deduplicate" data accidentally! Unfortunately, if we are careful, we will find something unpleasant. That is, when executing the above SQL statement for "de-duplication", a record with a NAME value of "gavin" and an AGE value of "18" is filtered out. However, in real life, it is very common to have the same name but different ages. 4 Conclusion By reading and practicing the above content, we already know that whether using the keyword distinct or the function row_number() over() can achieve the function of "deduplicate" data. However, in the process of implementation, we must pay special attention to the usage characteristics and differences between the two. When using the keyword distinct, we need to know that there is a difference between when it works on a single field and when it works on multiple fields. When it works on a single field, it "deduplicates" all data with duplicate values in the field in the table; when it works on multiple fields, it "deduplicates" data with the same values in all fields in the table (that is, the multiple fields that distinct works on). When using the row_number() over() function, it first groups and sorts the records, then takes out the first record of each group to perform "de-duplication" (as in this blog post). Of course, we can also use different restrictions to perform "deduplicate" here. How to implement it specifically requires everyone to think about it! Finally, in this blog post, the author elaborates on his understanding of using the keyword distinct and the function row_number() over() to "deduplicate" data. I hope the above content can be helpful to everyone! Thank you for reading, I hope it can help you, thank you for your support of this site! |
<<: Detailed explanation of asynchronous generators and asynchronous iterations in Node.js
>>: Friendly Alternatives to Find Tool in Linux
Table of contents How to deploy MySQL service usi...
1. The concept of css: (Cascading Style Sheet) Ad...
Table of contents Preface: System Requirements: I...
Table of contents Preface zx library $`command` c...
Anyone who has used Windows Remote Desktop to con...
First, what is box collapse? Elements that should...
To merge the following two files, merge them toge...
MySQL version: MySQL Community Edition (GPL) ----...
js execution Lexical analysis phase: includes thr...
This article shares the specific code of js to ac...
Table of contents Preface What to use if not jQue...
Use text-align, margin: 0 auto to center in CSS W...
Preface Sometimes when we view database data, we ...
Background of the problem The server monitoring s...
principle The principle of anti-shake is: you can...