1. Data Deduplication In daily work, there may be data duplication when using Hive or Impala to query and export, but you don’t want to re-execute the query (the query time is a bit long and the exported file content is large), so you think of using Linux commands to remove duplicate data from the file content. The following is an example: You can see that aaa.txx has 3 duplicate data I want to remove the redundant data and keep only one sort aaa.txt | uniq > bbb.txt Remove duplicate data from the aaa.txt file and output it to bbb.txt You can see that only one piece of data is retained in the bbb.txt file 2. Data intersection, union, and difference 1) Intersection (equivalent to user_2019 inner join user_2020 on user_2019.user_no=user_2020.user_no) 2) Union (equivalent to user_2019.user_no union user_2020.user_no) 3) Difference
The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: In-depth understanding of MySQL long transactions
>>: js to realize a simple disc clock
method: Take less in the actual project as an exa...
【Historical Background】 I have been working as a ...
When you start working on a project, it’s importa...
Ubuntu's own source is from China, so the dow...
What is JConsole JConsole was introduced in Java ...
Table of contents 1. Database constraints 1.1 Int...
Table of contents Usage scenarios Solution 1. Use...
Overview This article begins to introduce content...
1. The vertical-align property achieves the follo...
Table of contents 1. Overview 2. Application Exam...
Today, due to project requirements, js is needed t...
1. Environmental Preparation The IP address of ea...
1. Log in to MySQL and use SHOW VARIABLES LIKE ...
This article example shares the specific code for...
Two cases: 1. With index 2. Without index Prerequ...