NodeJs high memory usage troubleshooting actual combat record

NodeJs high memory usage troubleshooting actual combat record

Preface

This is an investigation caused by the expansion of an online container. Although it was finally found that it was not caused by the real OOM, I still summarized and recorded the investigation process. The whole process was like solving a case, looking for clues step by step and verifying the results step by step.

I personally think that there are several aspects to the significance and necessity of doing this:

  1. From the programmer's perspective: pursue code perfection, do not let go of problems, and ensure the stability of the business
  2. From a resource perspective: it is to reduce meaningless resource consumption
  3. From the company's perspective: reducing server costs and saving money for the company

Environment: NodeJs service running on Tencent Taf platform.

Cause of the problem

At first, it was because after a timing function was launched, the online container was automatically expanded. Since the NodeJs service itself only has some interface queries and socket.io functions, and there is no large traffic and no high concurrency, a service actually needs to expand to 8 containers (one container is allocated 2G of memory). Thinking of this, I suspected a memory leak. At the same time, insufficient memory is occasionally seen in the log.

Reasons for expansion

After asking the operation and maintenance staff, I found out that the expansion was caused by the memory usage reaching the critical value.

Load conditions

First, we need to rule out the possibility that the increased memory usage is caused by excessive service pressure. This may be a normal business phenomenon.

Through monitoring, we found that the traffic and CPU usage are not very high, and can even be said to be very low. Therefore, such a high memory usage is an abnormal phenomenon.

Because it is caused by memory issues and there is a phenomenon of gradual and continuous increase, we think of memory leak. The common practice is to print a "heap snapshot" or heapsnapshot file.

Enter the container:

go node name

Enter the NodeJs project folder

/usr/local/app/taf/service_name/bin/src

Generate a snapshot:

const heapdump = require('heapdump');
heapdump.writeSnapshot('./' + new Date().getTime() + '.heapsnapshot', function(err, filename) {
    console.log('dump written to', filename);
});

Due to the slow speed of using the lrzsz command to directly transfer files in the container, it is necessary to use the scp command to transfer them to a static resource server, which can be downloaded through the browser.

scp 1620374489828.heapsnapshot username@ip:/data/static/snapshot

Compare heapsnapshot

After the service is started and after running for a period of time, two snapshots are generated. After comparison, only keywords such as Websocket Socket can be roughly seen.

Further expansion will not reveal whether it is caused by a certain function.

There seems to be no clues in the snapshot. Since the business code of the entire project is not very large, we reviewed it line by line, but there seems to be no abnormal writing that will cause OOM. In fact, if the business code is small, it is okay. If it is a large project, this approach is not cost-effective and it is still necessary to use some diagnostic methods to check instead of directly going to code review.

After printing the snapshots several times and reading them several times, I still saw the words "websocket", so I wondered if the problem was caused by the socket link not being released?

I searched Google for WebSocket memory leak and it really exists. The solution is to add perMessageDeflate and disable compression. Currently, the lower version of socket-io is enabled by default, so I added it and observed the memory usage for a while, and there was no obvious drop. After the release, the memory usage was still very high.

Configuration syntax:

require('socket.io').listen(server, {perMessageDeflate: false});

The request sent by the client contains this field:

First of all, this parameter is used to compress data. It is enabled by default on the client side and disabled on the server side. For some reasons, enabling it will lead to memory and performance consumption. The official recommendation is to consider it before deciding whether to enable it. However, the lower versions of socket-io are enabled, such as version ^2.3.0 (it seems to be a bug, and subsequent versions have been changed to be closed by default).

The extension is disabled by default on the server and enabled by default on the client. It adds a significant overhead in terms of performance and memory consumption so we suggest to enable it only if it is really needed.

https://github.com/socketio/socket.io/issues/3477#issuecomment-610265035

After turning it on, the memory usage remains high.

console.log

Another phenomenon is that the existing Node service will print some logs. I looked through some articles about NodeJs memory leaks on the Internet and saw leaks caused by console log output. So I commented out the console and continued to observe the memory usage. The result was still high memory usage.

The clues seem to end here and there is no clue.

log

A day later, I accidentally looked at the log file. Since some startup logs are printed when the service starts, I found that there are repeated outputs:

This indicates that the system is running repeatedly. To verify this hypothesis, use the top command to view the results.

TOP Command

At the same time, I also want to see the specific memory usage. I found that there are so many worker processes. According to the actual usage of the current business, shouldn’t only 2 to 4 be enough? Why do we need to open so many child processes?

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND                                                                                                                       
 90359 username 20 0 736m 38m 14m S 0.0 0.0 0:07.30 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90346 username 20 0 864m 38m 14m S 0.3 0.0 0:07.08 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90381 username 20 0 730m 38m 14m S 0.3 0.0 0:08.75 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90366 username 20 0 804m 37m 14m S 0.0 0.0 0:06.94 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90618 username 20 0 730m 37m 14m S 0.0 0.0 0:08.42 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90326 username 20 0 736m 37m 14m S 0.0 0.0 0:08.46 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90542 username 20 0 736m 37m 14m S 0.0 0.0 0:08.85 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90332 username 20 0 799m 37m 14m S 0.0 0.0 0:07.32 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90580 username 20 0 732m 37m 14m S 0.3 0.0 0:08.94 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90602 username 20 0 731m 37m 14m S 0.3 0.0 0:08.33 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90587 username 20 0 735m 37m 14m S 0.0 0.0 0:08.83 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90568 username 20 0 731m 37m 14m S 0.0 0.0 0:08.83 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90544 username 20 0 729m 37m 14m S 0.0 0.0 0:09.07 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90556 username 20 0 729m 37m 14m S 0.0 0.0 0:08.82 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90431 username 20 0 735m 37m 14m S 0.0 0.0 0:08.29 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90486 username 20 0 729m 37m 14m S 0.0 0.0 0:09.06 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90516 username 20 0 735m 37m 14m S 0.0 0.0 0:08.95 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90465 username 20 0 729m 37m 14m S 0.0 0.0 0:09.06 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90527 username 20 0 735m 37m 14m S 0.0 0.0 0:08.46 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90487 username 20 0 732m 37m 14m S 0.3 0.0 0:08.48 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90371 username 20 0 731m 37m 14m S 0.3 0.0 0:08.75 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90423 username 20 0 729m 36m 14m S 0.3 0.0 0:08.09 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90402 username 20 0 729m 36m 14m S 0.3 0.0 0:08.96 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90500 username 20 0 729m 36m 14m S 0.0 0.0 0:08.70 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90353 username 20 0 729m 36m 14m S 0.3 0.0 0:08.95 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90636 username 20 0 729m 36m 14m S 0.0 0.0 0:08.84 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90425 username 20 0 732m 36m 14m S 0.0 0.0 0:08.78 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90506 username 20 0 729m 36m 14m S 0.0 0.0 0:08.84 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90589 username 20 0 729m 36m 14m S 0.3 0.0 0:09.05 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90595 username 20 0 729m 36m 14m S 0.0 0.0 0:09.03 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90450 username 20 0 729m 36m 14m S 0.3 0.0 0:08.97 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90531 username 20 0 729m 36m 14m S 0.0 0.0 0:08.99 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90509 username 20 0 735m 36m 14m S 0.0 0.0 0:08.67 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90612 username 20 0 730m 36m 14m S 0.3 0.0 0:08.84 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90479 username 20 0 729m 36m 14m S 0.0 0.0 0:08.58 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90609 username 20 0 731m 36m 14m S 0.3 0.0 0:09.23 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90404 username 20 0 734m 36m 14m S 0.3 0.0 0:08.78 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90395 username 20 0 736m 36m 14m S 0.0 0.0 0:08.57 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90444 username 20 0 729m 36m 14m S 0.0 0.0 0:09.04 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90438 username 20 0 729m 36m 14m S 0.3 0.0 0:07.78 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90340 username 20 0 736m 36m 14m S 0.3 0.0 0:07.37 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90333 username 20 0 729m 36m 14m S 0.0 0.0 0:07.60 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90563 username 20 0 735m 36m 14m S 0.3 0.0 0:08.93 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90565 username 20 0 734m 36m 14m S 0.3 0.0 0:08.77 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90457 username 20 0 735m 36m 14m S 0.0 0.0 0:08.31 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90387 username 20 0 740m 36m 14m S 0.0 0.0 0:07.59 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90573 username 20 0 728m 35m 14m S 0.0 0.0 0:09.06 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90472 username 20 0 728m 35m 14m S 0.0 0.0 0:08.94 /usr/local/app/service_name/bin/src/index.js: worker process                                                      
 90313 username 20 0 588m 27m 13m S 0.0 0.0 0:00.46 /usr/local/app/service_name/bin/src/index.js: master process

Since the %MEM column value does not show the specific memory usage inside the container and is always displayed as 0.0, you need to check the VIRT, RES and SHR values. Their meanings can be found here: https://www.orchome.com/298

We are more concerned about RES. RES refers to the size of the part of the process virtual memory space that has been mapped to the physical memory space. Therefore, we can find that a worker process occupies a memory size between 35 and 38M. There are 48 worker processes and one master process.

How did the 48 worker processes come about? By querying the logical number of CPUs, we can see that there are indeed 48.

# Total number of cores = number of physical CPUs x number of cores per physical CPU # Total number of logical CPUs = number of physical CPUs x number of cores per physical CPU x number of hyperthreads # View the number of physical CPUs cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l

# View the number of cores in each physical CPU (i.e. the number of cores)
cat /proc/cpuinfo| grep "cpu cores"| uniq

# View the number of logical CPUs cat /proc/cpuinfo | grep "processor" | wc -l

Control the number of processes

Since I am not very familiar with the Taf platform, I learned that running NodeJS on taf requires the corresponding package: @tars/node-agent. I checked the usage documentation on the official website: https://tarscloud.github.io/TarsDocs/dev/tars.js/tars-node-agent.html

There is a -i configuration, which stands for instances

-i, --instances

node-agent uses the Node.js native Cluster module to achieve load balancing.

You can configure the number of subprocesses (business processes) started by node-agent here:

If not configured (or configured to auto or 0 ), the number of child processes started is equal to the number of CPU physical cores.

When configured as max , the number of child processes started is equal to the number of CPUs (all cores).

If node-agent is started by tarsnode, the tars.application.client.asyncthread configuration section in the TARS configuration file will be read automatically.

You can also adjust it through TARS platform -> Edit service -> Number of asynchronous threads.

https://tarscloud.github.io/TarsDocs/dev/tars.js/tars-node-agent.html
Use this package to start the NodeJs service on Taf and enable the load balancing capability. Since the specific number of child processes (business processes) is not configured, the default is to use the number of CPU physical cores. Since there are 2 CPUs, multiply it by 2, and a total of 48 worker processes are generated. Each worker process occupies memory, so the memory usage remains high.

You can modify the configuration in the "Private Template":

Then restart the service and check the memory usage:

It can be seen that the number of worker processes affects the memory usage. The original memory usage trend chart will continue to grow (this is why I suspected memory leaks at the beginning). This problem did not manifest itself after reducing the number of worker processes. Ignore it for now and continue to observe it later.

In order to verify the relationship between the duplicate console and the worker process, we started two worker processes and checked the logs, which showed that the logs were indeed printed twice.

Summarize

Let’s review this issue:

Why wasn’t it discovered in time?

This may be related to the role of front-end developers, who are not very sensitive to some features of back-end services. I haven't paid any attention to it, or I don't know or understand it.

Can it be avoided in advance?

There can be a similar alarm mechanism to prompt that the memory of the Node service is on an upward trend. I am not yet familiar with the functions of the Taf platform, and I will explore it later.

This is the end of this article about troubleshooting high memory usage in NodeJs. For more information about high memory usage in NodeJs, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed explanation of JavaScript's memory space, assignment, and deep and shallow copies
  • An article to understand javascript memory leaks
  • How to write memory-efficient applications with Node.js
  • JavaScript garbage collection mechanism and memory management
  • Analysis of common JS memory leaks and solutions
  • Detailed explanation of javascript memory model example
  • Analysis of several examples of memory leaks caused by JS
  • Detailed explanation of JavaScript stack memory and heap memory
  • How to deal with JavaScript memory leaks
  • Detailed explanation of JS memory space

<<:  Create a custom system tray indicator for your tasks on Linux

>>:  MySql 8.0.11 installation and configuration tutorial

Recommend

7 native JS error types you should know

Table of contents Overview 1. RangeError 2. Refer...

6 interesting tips for setting CSS background images

Background-image is probably one of those CSS pro...

Detailed explanation of common methods of Vue development

Table of contents $nextTick() $forceUpdate() $set...

LinkedIn revamps to simplify website browsing

Business social networking site LinkedIn recently...

React method of displaying data in pages

Table of contents Parent component listBox List c...

How to use vuex in Vue project

Table of contents What is Vuex? Vuex usage cycle ...

How to install WSL2 Ubuntu20.04 on Windows 10 and set up the docker environment

Enable WSL Make sure the system is Windows 10 200...

CSS solution for centering elements with variable width and height

1. Horizontal center Public code: html: <div c...

How to remotely log in to the MySql database?

Introduction: Sometimes, in order to develop a pr...

Chinese website user experience rankings

<br />User experience is increasingly valued...

JavaScript to achieve digital clock effect

This article example shares the specific code of ...

Will this SQL writing method really cause the index to fail?

Preface There are often some articles on the Inte...

Markup language - simplified tags

Click here to return to the 123WORDPRESS.COM HTML ...