The implementation process of Linux process network traffic statistics

The implementation process of Linux process network traffic statistics

Preface

Linux has corresponding open source tools to collect network connection, process and other information in real time. The network connection generally includes the most basic five-tuple information (source address, destination address, source port, destination port, protocol number) plus the process information (pid, exe, cmdline) and so on. Most of these two data can be directly read from the network status connection files (/proc/net/tcp, /proc/net/udp) in the Linux /proc directory and the process status directory (/proc/pid/xx).

In some application security scenarios, it is necessary to combine data such as process network connections, inflow and outflow traffic to analyze whether there is malicious transmission of sensitive data in the intranet. During network monitoring, it is found that a large amount of server bandwidth is occupied, but it is not clear which specific process in the system is occupying it. To do this, it is necessary to obtain more fine-grained process-level network traffic data for comprehensive analysis.

Host-level network data can be found in the Linux proc directory. For example, /proc/net/snmp provides detailed data on the host's IP, ICMP, ICMPMsg, TCP, and UDP layers. The InBcastPkts, OutBcastPkts, InOctets, and OutOctets fields in the /proc/net/netstat file indicate the number of packets sent and received, and the byte data of received packets. Unfortunately, there is no process-level inbound and outbound network traffic data.

To this end, we refer to the principle of nethogs to implement a method for counting process-level network traffic.

Basic data

The following directories or files are involved: network status files /proc/net/tcp, /proc/net/udp, and process file descriptor directory /proc/pid/fd.

Network status file /proc/net/tcp

We focus on the 5-tuple + status + inode number in the 2nd, 3rd, 4th, and 11th columns respectively.

The 23rd column is the host byte order ip:port, for example "0500000A:0016" -> "10.0.0.5", 22

The 4th column is the status information. The meaning of the status field is as follows:

“01″: “ESTABLISHED”,
“02″: “SYN_SENT”,
“03″: “SYN_RECV”,
“04″: “FIN_WAIT1″,
“05″: “FIN_WAIT2″,
“06″: “TIME_WAIT”,
“07″: “CLOSE”,
“08″: “CLOSE_WAIT”,
“09″: “LAST_ACK”,
“0A”: “LISTEN”,
“0B”: “CLOSING”

The 11th column is the inode number of a file system object in the Linux system file system, including metadata of files, directories, device files, sockets, pipes, etc.


Process file descriptors

The /proc/pid/fd directory lists the file information opened by the current process, where 0, 1, and 2 represent standard input, output, and error.

The network connection is a file descriptor that starts with socket:, where the inode number is in brackets [], which corresponds to the inode number in the network status file /proc/net/tcp.

Taking the pid:25133 process as an example, the file descriptors are 10 and 12, and the corresponding inode numbers are 512505532 and 512473483 respectively. At the same time, detailed information of the corresponding connections can be found in /proc/net/tcp in the figure below.

Based on the above file information, a mapping of network connection quintuple -> inode can be established from /proc/net/tcp, while a mapping of connection inode -> process can be established from /proc/pid/fd.

In this way, the inode number is used as a bridge to associate the processes in the system with the network connection information.

Implementation Process

In order to obtain network connection traffic in real time, the open source libpcap library is used on the Linux host to capture network packets. The entire implementation flowchart is as follows, which includes the following 5 key steps.

Packet capture

Use the Libpcap library to obtain the network packet structure.

Parsing the message

Parse the packet's five-tuple (source address, destination address, source port, destination port, protocol number) information and the current packet's traffic size.

Cache Updates

Search for the inode number corresponding to the key composed of the five-tuple in ConnInodeHash. If it does not exist, re-read /proc/net/tcp and udp, refresh the ConnInodeHash cache, establish the mapping between the connection and the inode, and re-read the /proc/pid/fd directory to traverse all file descriptors and filter out the connections starting with socket:. Refresh the InodeProcessHash cache and re-establish the mapping between the inode and the process.

hash search

According to the found inode number, find the corresponding process pid in InodeProcessHash.

Traffic Statistics

According to the message address, the network connection direction is determined, and the inflow and outflow data of the process are accumulated.

Summarize

Capture packets on Linux hosts and implement a fine-grained process-level network traffic collection method by combining network status files and process file descriptors.

By using the Linux file inode number as a bridge, the relationship between the process and the network connection can be associated. The total amount/average value of the process received/sent and other dimensional data can be counted, and the traffic data of each network connection of the process can also be analyzed. These can serve as important basis for host traffic security analysis, network monitoring and troubleshooting and other scenarios. However, it should also be noted that continuously capturing packets through libpcap will have a detrimental effect on host performance.

The above is the implementation process of Linux process network traffic statistics introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

<<:  Ubuntu 15.04 opens mysql remote port 3306

>>:  Detailed steps for Navicat to remotely connect to SQL Server and convert to MySQL

Recommend

Front-end AI cutting tips (experience)

AI image cutting needs to be coordinated with PS....

Exploring the practical value of the CSS property *-gradient

Let me first introduce an interesting property - ...

MYSQL subquery and nested query optimization example analysis

Check the top 100 highest scores in game history ...

Details on macrotasks and microtasks in JavaScript

Table of contents 1. What are microtasks? 2. What...

Idea deploys remote Docker and configures the file

1. Modify the Linux server docker configuration f...

How to configure MySQL scheduled tasks (EVENT events) in detail

Table of contents 1. What is an event? 2. Enable ...

Centos7 installation and configuration of Mysql5.7

Step 1: Get the MySQL YUM source Go to the MySQL ...

How to make an input text box change length according to its content

First: Copy code The code is as follows: <input...

MySql COALESCE function usage code example

COALESCE is a function that refers to each parame...

MySQL 8.0 New Features: Hash Join

The MySQL development team officially released th...

Analysis of the advantages and disadvantages of MySQL stored procedures

MySQL version 5.0 began to support stored procedu...

Understanding innerHTML

<br />Related articles: innerHTML HTML DOM i...