Detailed explanation of the performance monitoring ideas of specified processes in Linux system based on Python

Detailed explanation of the performance monitoring ideas of specified processes in Linux system based on Python

There are many tools, components and programs for monitoring Linux servers on the Internet, but there will be many processes running on a server at the same time. Especially when doing performance testing, multiple services may be deployed on a server. If you only monitor the CPU and memory of the entire server, when a service has performance problems, you cannot effectively and accurately locate them (of course, this can also be achieved through other tools). Therefore, it is necessary to monitor only specified processes. The requirements were clear, so I started to write a performance monitoring script.

1. Overall thinking

1. In order to conveniently start and stop monitoring, and view the monitoring results at any time when you want to view the monitoring results, a service is started with flask. By sending a get request, you can start and stop monitoring and view the monitoring results at any time.
2. To control whether to monitor CPU, memory, and IO, enable multi-threaded monitoring.
3. In order to reduce the dependence on other components, the monitoring results are written to the log.
4. In order to facilitate viewing of monitoring results, the results are directly returned in HTML format.

insert image description here

2. Configuration File

config.py

IP = '127.0.0.1'
PORT = '5555'
LEVEL = 'INFO' # log level
BACKUP_COUNT = 9 # log backup counter
LOG_PATH = 'logs' # log path
INTERVAL = 1 # interval, run command interval.
SLEEPTIME = 3 # interval, when stopping monitor, polling to start monitor when satisfying condition.
ERROR_TIMES = 5 # times, number of running command. When equal, automatically stopped monitor.
IS_JVM_ALERT = True # Whether to alert when the frequency of Full GC is too high.
IS_MONITOR_SYSTEM = True # Whether to monitor system's CPU and Memory.
IS_MEM_ALERT = True # Whether to alert when memory is too low. Alert by sending email.
MIN_MEM = 2 # Minxium memory, uint: G
# 0: don't clear cache, 1: clear page caches, 2: clear dentries and inodes caches, 3: include 1 and 2;
# echo 1 >/proc/sys/vm/drop_caches
ECHO = 0
SMTP_SERVER = 'smtp.sina.com' # SMTP server
SENDER_NAME = '张三' # sender name
SENDER_EMAIL = '[email protected]' # sender's email
PASSWORD = 'UjBWYVJFZE9RbFpIV1QwOVBUMDlQUT09' # email password, base64 encode.
RECEIVER_NAME = 'baidu_all' # receiver name
RECEIVER_EMAIL = ['[email protected]', '[email protected]'] # receiver's email
DISK = 'device1' # Which disk your application runs
START_TIME = 'startTime.txt' # Store the time of start monitoring.
FGC_TIMES = 'FullGC.txt' # Store the time of every FullGC time.
#html
HTML = '<html><body>{}</body><html>'
ERROR = '<p style="color:red">{}</p>'
HEADER = '<div id="header"><h2 align="center">Performance Monitor (pid={})</h2></div>'
ANALYSIS = '<div id="container" style="width:730px; margin:0 auto">{}</div>'

IP and PORT: The server IP and port where the service is enabled. They must be on the same server as the monitored service.
BACKUP_COUNT: The default value is 9, which means that only the monitoring results of the last 9 days are retained;
INTERVAL: The time interval between two monitorings. The default value is 1s. It is mainly used for CPU and memory monitoring. When monitoring multiple ports or processes at the same time, please set this value to a smaller value.
ERROR_TIMES: The number of command execution failures. If the number is greater than this, monitoring will automatically stop. It is mainly used to monitor a specified process. If the process is killed, monitoring must be automatically stopped and must be manually triggered to start monitoring again. If monitoring a specified port, monitoring will also stop when the process of the port is killed. If the port is restarted, monitoring will automatically start.
IS_JVM_ALERT: only for Java applications, if FullGC is frequent, an email reminder will be sent; for general performance tests, the frequency of FullGC should not be less than 3600 seconds;
IS_MONITOR_SYSTEM: Whether to monitor the total CPU usage and remaining memory of the system;
IS_MEM_ALERT: whether to send an email reminder when the system's remaining memory is too low;
MIN_MEM: The minimum remaining memory allowed by the system, in GB;
ECHO: whether to release the cache when the remaining system memory is too low; 0 means not to release, 1 means to release the page cache, 2 means to release the dentries and inodes cache, and 3 means to release 1 and 2;
DISK: disk number. If you monitor IO, you need to enter the disk number and use df -h file name to check which disk the current file is mounted on.
START_TIME: records the time when each monitoring is manually triggered to start;
FGC_TIMES: records the time of each FullGC for troubleshooting;

3. Interfaces and Services

server.py

server = Flask(__name__)
permon = PerMon()
# Enable multithreading t = [threading.Thread(target=permon.write_cpu_mem, args=()),
 threading.Thread(target=permon.write_io, args=())]
for i in range(len(t)):
 t[i].start()
# Start monitoring# http://127.0.0.1:5555/runMonitor?isRun=1&type=pid&num=23121&totalTime=3600
@server.route('/runMonitor', methods=['get'])
def runMonitor():......
# Draw the monitoring result graph# http://127.0.0.1:5555/plotMonitor?type=pid&num=23121
@server.route('/plotMonitor', methods=['get'])
def plotMonitor():.......
server.run(port=cfg.PORT, debug=True, host=cfg.IP) # Start the service

By entering the corresponding URL in the browser address bar, you can start and stop monitoring and view the monitoring results.

URL parameter passing:

1. Start monitoring

http://127.0.0.1:5555/runMonitor?isRun=1&type=pid&num=23121&totalTime=3600
isRun: 1 means start monitoring, 0 means stop monitoring;
type and num: type=pid indicates that num is the process number, type=port indicates that num is the port number; multiple ports or processes can be monitored at the same time, and multiple ports or processes are separated by English commas;
totalTime: the total monitoring time in seconds. If totalTime is not passed, the system will monitor all the time by default.

2. View monitoring results

http://127.0.0.1:
5555/plotMonitor?type=port&num=23121&system=1&startTime=2019-08-03 08:08:08&duration=3600
type and num: type=pid indicates that num is the process number, type=port indicates that num is the port number;
system: indicates to check the system monitoring results. If type and num are passed, no matter whether sysytem is passed or not, only the process monitoring results can be viewed. If type and num are not passed and only system is passed, the system monitoring results can be viewed.
startTime: Check the start time of monitoring results;
duration: the duration for viewing monitoring results, in seconds;
If startTime and duration are not passed, all results since the last monitoring start will be viewed by default. If you need to view monitoring results within a certain period of time, startTime and duration need to be passed. The time range for viewing monitoring results is from startTime to startTime+duration.
Note: If the service is restarted within a period of time when you view the monitoring results, the process number will change. If you still enter the process number before the restart, you can only view the monitoring results of the corresponding process number within the corresponding time period. Generally, the port number will not change easily. It is recommended to enter the port number when viewing the monitoring results.

4. Monitoring

performance_monitor.py

Use the top command to monitor the CPU and memory, use the jstat command to monitor the JVM memory (Java applications only), use the iotop command to monitor the process reading and writing disks, use the iostat command to monitor disk IO, use the netstat command to check the process based on the port, and use the ps command to view the service startup time. Therefore, the server must support the above commands. If not, please install them.

Note: Since a process can start multiple threads, you cannot see any IO when viewing the IO of the process; while you can see the IO of a thread started by the process, the IO is visible, but the thread is always changing; therefore, monitoring the IO of a specified process is not currently supported.

5. View monitoring results

draw_performance.py

1. Draw the CPU graph, memory and JVM graph, IO graph and handle count graph respectively;
2. Calculate percentiles to facilitate statistics of CPU and IO usage;
3. To facilitate the statistics of garbage collection information, calculate the ygc, fgc, and respective frequencies of Java applications.

The monitoring results are as follows:

insert image description here

6. Extension Functions

extern.py has two functions

1. Port transfer process

try:
 result = os.popen(f'netstat -nlp|grep {port} |tr -s " "').readlines()
 res = [line.strip() for line in result if str(port) in line]
 p = res[0].split(' ')
 pp = p[3].split(':')[-1]
 if str(port) == pp:
 pid = p[-1].split('/')[0]
except Exception as err:
 logger.logger.error(err)

2. Find the log containing monitoring results

Overall idea:

(1) Based on the input start time and end time, find all log files that include this period;
(2) Based on the found log files, find all logs containing monitoring results;
(3) When drawing the graph, traverse all the logs found.

Replenish

1. In order to facilitate the viewing of the most recent monitoring start time, each monitoring start time will be written to the startTime.txt file;

2. In order to facilitate the troubleshooting of possible problems in Java applications, write the time of each Full GC to the FullGC.txt file.

Project address: https://github.com/leeyoshinari/performance_monitor

Summarize

The above is the Linux system specified process performance monitoring based on Python introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • A brief introduction to Linux performance monitoring commands free
  • PHP+swoole+linux to achieve system monitoring and performance optimization operation example
  • Detailed explanation of using top command to analyze Linux system performance
  • Detailed explanation of Linux server status and performance related commands
  • Detailed explanation of Linux performance test pmap command
  • 20 Linux server performance optimization tips worth collecting
  • Tutorial on using http_load, a web performance stress testing tool, under Linux
  • Four ways to achieve web data synchronization under Linux (performance comparison)
  • Linux+Nginx+Php to build a high-performance WEB server
  • Linux performance monitoring tool nmon installation and usage tutorial analysis

<<:  Solution to automatically trigger click events when clicking on pop-up window in Vue (simulation scenario)

>>:  The principle and application of ES6 deconstruction assignment

Recommend

IE6 space bug fix method

Look at the code: Copy code The code is as follows...

MySQL deduplication methods

MySQL deduplication methods 【Beginner】There are v...

A brief discussion on the correct posture of Tomcat memory configuration

1. Background Although I have read many blogs or ...

Build a high-availability MySQL cluster with dual VIP

Table of contents 1. Project Description: 2. Proj...

Simple implementation of mini-vue rendering

Table of contents Preface Target first step: Step...

Use of kubernetes YAML files

Table of contents 01 Introduction to YAML files Y...

Vue example code using transition component animation effect

Transition document address defines a background ...

CSS style solves the problem of displaying ellipsis when the text is too long

1. CSS style solves the problem of displaying ell...

How to use node scaffolding to build a server to implement token verification

content Use scaffolding to quickly build a node p...

An article to understand the advanced features of K8S

Table of contents K8S Advanced Features Advanced ...

Teach you how to build hive3.1.2 on Tencent Cloud

Environment Preparation Before starting any opera...

In-depth analysis of Flex layout in CSS3

The Flexbox layout module aims to provide a more ...

Research on the problem of flip navigation with tilted mouse

In this article, we will analyze the production of...