Detailed explanation of the performance monitoring ideas of specified processes in Linux system based on Python

Detailed explanation of the performance monitoring ideas of specified processes in Linux system based on Python

There are many tools, components and programs for monitoring Linux servers on the Internet, but there will be many processes running on a server at the same time. Especially when doing performance testing, multiple services may be deployed on a server. If you only monitor the CPU and memory of the entire server, when a service has performance problems, you cannot effectively and accurately locate them (of course, this can also be achieved through other tools). Therefore, it is necessary to monitor only specified processes. The requirements were clear, so I started to write a performance monitoring script.

1. Overall thinking

1. In order to conveniently start and stop monitoring, and view the monitoring results at any time when you want to view the monitoring results, a service is started with flask. By sending a get request, you can start and stop monitoring and view the monitoring results at any time.
2. To control whether to monitor CPU, memory, and IO, enable multi-threaded monitoring.
3. In order to reduce the dependence on other components, the monitoring results are written to the log.
4. In order to facilitate viewing of monitoring results, the results are directly returned in HTML format.

insert image description here

2. Configuration File

config.py

IP = '127.0.0.1'
PORT = '5555'
LEVEL = 'INFO' # log level
BACKUP_COUNT = 9 # log backup counter
LOG_PATH = 'logs' # log path
INTERVAL = 1 # interval, run command interval.
SLEEPTIME = 3 # interval, when stopping monitor, polling to start monitor when satisfying condition.
ERROR_TIMES = 5 # times, number of running command. When equal, automatically stopped monitor.
IS_JVM_ALERT = True # Whether to alert when the frequency of Full GC is too high.
IS_MONITOR_SYSTEM = True # Whether to monitor system's CPU and Memory.
IS_MEM_ALERT = True # Whether to alert when memory is too low. Alert by sending email.
MIN_MEM = 2 # Minxium memory, uint: G
# 0: don't clear cache, 1: clear page caches, 2: clear dentries and inodes caches, 3: include 1 and 2;
# echo 1 >/proc/sys/vm/drop_caches
ECHO = 0
SMTP_SERVER = 'smtp.sina.com' # SMTP server
SENDER_NAME = '张三' # sender name
SENDER_EMAIL = '[email protected]' # sender's email
PASSWORD = 'UjBWYVJFZE9RbFpIV1QwOVBUMDlQUT09' # email password, base64 encode.
RECEIVER_NAME = 'baidu_all' # receiver name
RECEIVER_EMAIL = ['[email protected]', '[email protected]'] # receiver's email
DISK = 'device1' # Which disk your application runs
START_TIME = 'startTime.txt' # Store the time of start monitoring.
FGC_TIMES = 'FullGC.txt' # Store the time of every FullGC time.
#html
HTML = '<html><body>{}</body><html>'
ERROR = '<p style="color:red">{}</p>'
HEADER = '<div id="header"><h2 align="center">Performance Monitor (pid={})</h2></div>'
ANALYSIS = '<div id="container" style="width:730px; margin:0 auto">{}</div>'

IP and PORT: The server IP and port where the service is enabled. They must be on the same server as the monitored service.
BACKUP_COUNT: The default value is 9, which means that only the monitoring results of the last 9 days are retained;
INTERVAL: The time interval between two monitorings. The default value is 1s. It is mainly used for CPU and memory monitoring. When monitoring multiple ports or processes at the same time, please set this value to a smaller value.
ERROR_TIMES: The number of command execution failures. If the number is greater than this, monitoring will automatically stop. It is mainly used to monitor a specified process. If the process is killed, monitoring must be automatically stopped and must be manually triggered to start monitoring again. If monitoring a specified port, monitoring will also stop when the process of the port is killed. If the port is restarted, monitoring will automatically start.
IS_JVM_ALERT: only for Java applications, if FullGC is frequent, an email reminder will be sent; for general performance tests, the frequency of FullGC should not be less than 3600 seconds;
IS_MONITOR_SYSTEM: Whether to monitor the total CPU usage and remaining memory of the system;
IS_MEM_ALERT: whether to send an email reminder when the system's remaining memory is too low;
MIN_MEM: The minimum remaining memory allowed by the system, in GB;
ECHO: whether to release the cache when the remaining system memory is too low; 0 means not to release, 1 means to release the page cache, 2 means to release the dentries and inodes cache, and 3 means to release 1 and 2;
DISK: disk number. If you monitor IO, you need to enter the disk number and use df -h file name to check which disk the current file is mounted on.
START_TIME: records the time when each monitoring is manually triggered to start;
FGC_TIMES: records the time of each FullGC for troubleshooting;

3. Interfaces and Services

server.py

server = Flask(__name__)
permon = PerMon()
# Enable multithreading t = [threading.Thread(target=permon.write_cpu_mem, args=()),
 threading.Thread(target=permon.write_io, args=())]
for i in range(len(t)):
 t[i].start()
# Start monitoring# http://127.0.0.1:5555/runMonitor?isRun=1&type=pid&num=23121&totalTime=3600
@server.route('/runMonitor', methods=['get'])
def runMonitor():......
# Draw the monitoring result graph# http://127.0.0.1:5555/plotMonitor?type=pid&num=23121
@server.route('/plotMonitor', methods=['get'])
def plotMonitor():.......
server.run(port=cfg.PORT, debug=True, host=cfg.IP) # Start the service

By entering the corresponding URL in the browser address bar, you can start and stop monitoring and view the monitoring results.

URL parameter passing:

1. Start monitoring

http://127.0.0.1:5555/runMonitor?isRun=1&type=pid&num=23121&totalTime=3600
isRun: 1 means start monitoring, 0 means stop monitoring;
type and num: type=pid indicates that num is the process number, type=port indicates that num is the port number; multiple ports or processes can be monitored at the same time, and multiple ports or processes are separated by English commas;
totalTime: the total monitoring time in seconds. If totalTime is not passed, the system will monitor all the time by default.

2. View monitoring results

http://127.0.0.1:
5555/plotMonitor?type=port&num=23121&system=1&startTime=2019-08-03 08:08:08&duration=3600
type and num: type=pid indicates that num is the process number, type=port indicates that num is the port number;
system: indicates to check the system monitoring results. If type and num are passed, no matter whether sysytem is passed or not, only the process monitoring results can be viewed. If type and num are not passed and only system is passed, the system monitoring results can be viewed.
startTime: Check the start time of monitoring results;
duration: the duration for viewing monitoring results, in seconds;
If startTime and duration are not passed, all results since the last monitoring start will be viewed by default. If you need to view monitoring results within a certain period of time, startTime and duration need to be passed. The time range for viewing monitoring results is from startTime to startTime+duration.
Note: If the service is restarted within a period of time when you view the monitoring results, the process number will change. If you still enter the process number before the restart, you can only view the monitoring results of the corresponding process number within the corresponding time period. Generally, the port number will not change easily. It is recommended to enter the port number when viewing the monitoring results.

4. Monitoring

performance_monitor.py

Use the top command to monitor the CPU and memory, use the jstat command to monitor the JVM memory (Java applications only), use the iotop command to monitor the process reading and writing disks, use the iostat command to monitor disk IO, use the netstat command to check the process based on the port, and use the ps command to view the service startup time. Therefore, the server must support the above commands. If not, please install them.

Note: Since a process can start multiple threads, you cannot see any IO when viewing the IO of the process; while you can see the IO of a thread started by the process, the IO is visible, but the thread is always changing; therefore, monitoring the IO of a specified process is not currently supported.

5. View monitoring results

draw_performance.py

1. Draw the CPU graph, memory and JVM graph, IO graph and handle count graph respectively;
2. Calculate percentiles to facilitate statistics of CPU and IO usage;
3. To facilitate the statistics of garbage collection information, calculate the ygc, fgc, and respective frequencies of Java applications.

The monitoring results are as follows:

insert image description here

6. Extension Functions

extern.py has two functions

1. Port transfer process

try:
 result = os.popen(f'netstat -nlp|grep {port} |tr -s " "').readlines()
 res = [line.strip() for line in result if str(port) in line]
 p = res[0].split(' ')
 pp = p[3].split(':')[-1]
 if str(port) == pp:
 pid = p[-1].split('/')[0]
except Exception as err:
 logger.logger.error(err)

2. Find the log containing monitoring results

Overall idea:

(1) Based on the input start time and end time, find all log files that include this period;
(2) Based on the found log files, find all logs containing monitoring results;
(3) When drawing the graph, traverse all the logs found.

Replenish

1. In order to facilitate the viewing of the most recent monitoring start time, each monitoring start time will be written to the startTime.txt file;

2. In order to facilitate the troubleshooting of possible problems in Java applications, write the time of each Full GC to the FullGC.txt file.

Project address: https://github.com/leeyoshinari/performance_monitor

Summarize

The above is the Linux system specified process performance monitoring based on Python introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • A brief introduction to Linux performance monitoring commands free
  • PHP+swoole+linux to achieve system monitoring and performance optimization operation example
  • Detailed explanation of using top command to analyze Linux system performance
  • Detailed explanation of Linux server status and performance related commands
  • Detailed explanation of Linux performance test pmap command
  • 20 Linux server performance optimization tips worth collecting
  • Tutorial on using http_load, a web performance stress testing tool, under Linux
  • Four ways to achieve web data synchronization under Linux (performance comparison)
  • Linux+Nginx+Php to build a high-performance WEB server
  • Linux performance monitoring tool nmon installation and usage tutorial analysis

<<:  Solution to automatically trigger click events when clicking on pop-up window in Vue (simulation scenario)

>>:  The principle and application of ES6 deconstruction assignment

Recommend

Singleton design pattern in JavaScript

Table of contents 1. What is a design pattern? 2....

How to run MySQL using docker-compose

Directory Structure . │ .env │ docker-compose.yml...

Alibaba Cloud Centos7.3 installation mysql5.7.18 rpm installation tutorial

Uninstall MariaDB CentOS7 installs MariaDB instea...

MySQL database introduction: detailed explanation of database backup operation

Table of contents 1. Single database backup 2. Co...

Docker container time zone adjustment operation

How to check if the Docker container time zone is...

How to configure ssh to log in to Linux using git bash

1. First, generate the public key and private key...

How to Completely Clean Your Docker Data

Table of contents Prune regularly Mirror Eviction...

How to configure pseudo-static and client-adaptive Nginx

The backend uses the thinkphp3.2.3 framework. If ...

Design Theory: Hierarchy in Design

<br />Original text: http://andymao.com/andy...

JavaScript implements the nine-grid click color change effect

This article shares the specific code of JavaScri...

SystemC environment configuration method under Linux system

The following is the configuration method under c...

Detailed tutorial on installing Spring boot applications on Linux systems

Unix/Linux Services systemd services Operation pr...

Detailed explanation of the use of this.$set in Vue

Table of contents Use of this.$set in Vue use Why...

How to install and persist the postgresql database in docker

Skip the Docker installation steps 1. Pull the po...