How to find identical files in Linux

How to find identical files in Linux

As the computer is used, a lot of garbage will be generated in the system. The most typical case is that the same file is saved in different locations. The result is that a large amount of disk space is occupied and the system runs slower and slower.

So if your computer is running out of space, you can try deleting such files to free up some space. Under Linux, we can find the same file in the system by identifying the file's inode value.

An inode is a data structure that records all information about a file, except the file name and file contents. If two or more files have the same inode value, even if their file names are different and their locations are different, their contents, owners, and permissions are actually the same, and we can regard them as the same file.

This type of file is actually a so-called "hard link". Hard links have the same inode value but different file names. A soft link is actually a shortcut that points to the target file but has its own inode value.

$ ls -l my*
-rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 myfile
lrwxrwxrwx 1 liangxu liangxu 6 Apr 15 11:18 myref -> myfile
-rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 mytwin

We cannot directly know which files in the same directory have the same inode value, but it is not difficult to identify them. In fact, we can find these files directly by using the ls -i command and sorting by inode value.

$ ls -i | sort -n | more
 ...
 788000 myfile <==
 788000 mytwin <==
 801865 Name_Labels.pdf
 786692 never leave home angry
 920242 NFCU_Docs
 800247 nmap-notes

In the first column of this result is the corresponding inode value. So from this result we can see at a glance which files have the same inode value.

If you just want to find the corresponding hard link file of a file, we can use the find command and add the -samefile option to find it quickly.

$ find . -samefile myfile
./myfile
./save/mycopy
./mytwin

These files all have the same inode value. If you don’t believe it, you can use the ls command to view more information:

$ find . -samefile myfile -ls
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 ./myfile
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 ./save/mycopy
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 ./mytwin

We can see that, except for the file names, the information of these file names is exactly the same. Careful friends may notice that the second column (number of hard links) is 4, but in fact we found only 3 files, which means there is another file sharing the inode value with them, but we did not find it through this command.

As a lazy person, it is too troublesome to type commands every time, so I can just use the script to find the same files in the directory!

#!/bin/bash

# seaches for files sharing inodes

prev=""

# list files by inode
ls -i | sort -n > /tmp/$0

# search through file for duplicate inode #s
while read line
do
  inode=`echo $line | awk '{print $1}'`
  if [ "$inode" == "$prev" ]; then
    grep $inode /tmp/$0
  fi
  prev=$inode
done < /tmp/$0

# clean up
rm /tmp/$0

Running results:

$ ./findHardLinks
 788000 myfile
 788000 mytwin

Of course, you can also use the find command to find all identical files in the system based on the inode value.

$ find / -inum 788000 -ls 2> /dev/null
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 /tmp/mycopy
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 /home/liangxu/myfile
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 /home/liangxu/save/mycopy
 788000 4 -rw-r--r-- 4 liangxu liangxu 228 Apr 12 19:37 /home/liangxu/mytwin

In this command, we redirect the error message to the special file /dev/null, so that when searching for some paths that we do not have permission to access, the screen will not be filled with permission denied.

This is the end of this article about how to find identical files in Linux. For more information about finding identical files in Linux, please search previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Summary of five search commands in Linux
  • How to fuzzily find a file in Linux
  • Linux shell searches for files and displays line numbers and corresponding intervals
  • What command is better for fuzzy searching files in Linux?
  • Linux command find file search example
  • Detailed explanation of how to find files filtered by time in a directory in Linux
  • How to find files in Linux
  • Complete Guide to File Search in Linux

<<:  js canvas realizes rounded corners picture

>>:  Example analysis of mysql variable usage [system variables, user variables]

Recommend

How to use docker compose to build fastDFS file server

The previous article introduced a detailed exampl...

Who is a User Experience Designer?

Scary, isn't it! Translation in the picture: ...

Page Speed ​​Optimization at a Glance

I believe that the Internet has become an increas...

Solution to Ubuntu not being able to connect to the Internet

Problem description: I used a desktop computer an...

How to enter and exit the Docker container

1 Start the Docker service First you need to know...

Introduction to MySQL role functions

Table of contents Preface: 1. Introduction to rol...

Implementation of master-slave replication in docker compose deployment

Table of contents Configuration parsing Service C...

The whole process of node.js using express to automatically build the project

1. Install the express library and generator Open...

Tutorial on how to modify the root password in MySQL 5.7

Version update, the password field in the origina...

Explanation of the process of docker packaging node project

As a backend programmer, sometimes I have to tink...

SQL implementation of LeetCode (181. Employees earn more than managers)

[LeetCode] 181.Employees Earning More Than Their ...

CSS syntax for table borders

<br /> CSS syntax for table borders The spec...

You really need to understand the use of CSS variables var()

When a web project gets bigger and bigger, its CS...