1. Data Deduplication In daily work, there may be data duplication when using Hive or Impala to query and export, but you don’t want to re-execute the query (the query time is a bit long and the exported file content is large), so you think of using Linux commands to remove duplicate data from the file content. The following is an example: You can see that aaa.txx has 3 duplicate data I want to remove the redundant data and keep only one sort aaa.txt | uniq > bbb.txt Remove duplicate data from the aaa.txt file and output it to bbb.txt You can see that only one piece of data is retained in the bbb.txt file 2. Data intersection, union, and difference 1) Intersection (equivalent to user_2019 inner join user_2020 on user_2019.user_no=user_2020.user_no) 2) Union (equivalent to user_2019.user_no union user_2020.user_no) 3) Difference
The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. You may also be interested in:
|
<<: In-depth understanding of MySQL long transactions
>>: js to realize a simple disc clock
In the previous article, after configuring the we...
1: I won’t go into the details of how to install ...
Today I encountered the MySQL service 1067 error ...
In the recent project, we need to create an effec...
In life, the Internet is everywhere. We can play ...
Table of contents 1. Overview 2. Use docker to de...
Installation path: /application/mysql-5.7.18 1. P...
PHP7 has been out for quite some time, and it is ...
Table of contents 1. How is cross-domain formed? ...
What is Redis Cluster Redis cluster is a distribu...
Chapter 1 Source Code Installation The installati...
The <area> tag defines an area in an image ...
MySql is a data source we use frequently. It is v...
Button is used quite a lot. Here I have sorted ou...
Effect The effect is as follows Implementation ...