How to write memory-efficient applications with Node.js

How to write memory-efficient applications with Node.js

Preface

Software applications run in the computer's main memory, which is called random access memory (RAM). JavaScript, especially Nodejs (server-side js) allows us to write small to large software projects for end users. Handling program memory is always a tricky issue, as a bad implementation can block all other applications running on a given server or system. C and C++ programmers do care about memory management, because there are terrible memory leaks hiding in every corner of the code. But for JS developers, have you really cared about this issue?

Since JS developers typically do web server programming on dedicated, high-capacity servers, they may not notice delays in multitasking. For example, in the case of developing a web server, we will also run multiple applications like a database server (MySQL), a cache server (Redis) and other applications as required. We need to be aware that they also consume available main memory. If we write our application carelessly, we are likely to degrade the performance of other processes or even deny them memory allocations altogether. In this article, we will solve a problem to understand NodeJS constructs such as streams, buffers, and pipes and see how each of them supports writing memory-efficient applications.

Problem: Large file copying

If someone is asked to write a file copying program using NodeJS, he will quickly write the following code:

const fs = require('fs');

let fileName = process.argv[2];
let destPath = process.argv[3];

fs.readFile(fileName, (err, data) => {
    if (err) throw err;

    fs.writeFile(destPath || 'output', data, (err) => {
        if (err) throw err;
    });
    
    console.log('New file has been created!');
});

This code simply takes the input file name and path and writes it to the destination path after attempting to read the file, which is not a problem for small files.

Now suppose we have a large file (greater than 4 GB) that we need to back up using this program. Take my 7.4G ultra-high-definition 4K movie as an example. I use the above program code to copy it from the current directory to another directory.

$ node basic_copy.js cartoonMovie.mkv ~/Documents/bigMovie.mkv

Then I got this error message on Ubuntu (Linux):

/home/shobarani/Workspace/basic_copy.js:7

if (err) throw err;

^

RangeError: File size is greater than possible Buffer: 0x7fffffff bytes

at FSReqWrap.readFileAfterStat [as oncomplete] (fs.js:453:11)

As you can see, the error occurs while reading the file because NodeJS only allows writing a maximum of 2GB of data into its buffer. To solve this problem, when you are doing I/O intensive operations (copying, processing, compression, etc.), it is best to consider the memory situation.

Streams and Buffers in NodeJS

To solve the above problem, we need a way to cut large files into many file blocks, and we need a data structure to store these file blocks. A buffer is a structure used to store binary data. Next, we need a way to read and write file blocks, and Streams provides this capability.

Buffers

We can easily create a buffer using the Buffer object.

let buffer = new Buffer(10); # 10 is the volume of buffer console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

In newer versions of NodeJS (>8), you can also write like this.

let buffer = new Buffer.alloc(10);
console.log(buffer); # prints <Buffer 00 00 00 00 00 00 00 00 00 00>

If we already have some data, such as an array or other data set, we can create a buffer for it.

let name = 'Node JS DEV';
let buffer = Buffer.from(name);
console.log(buffer) # prints <Buffer 4e 6f 64 65 20 4a 53 20 44 45 5>

Buffers have some important methods like buffer.toString() and buffer.toJSON() that allow you to drill down into the data they store.

We will not create raw buffers directly for the sake of optimizing the code. NodeJS and the V8 engine already implement this by creating internal buffers (queues) when handling streams and network sockets.

Streams

In simple terms, streams are like arbitrary doors on NodeJS objects. In computer networking, ingress is an input action and egress is an output action. We will continue to use these terms below.

There are four types of streams:

  • Readable stream (for reading data)
  • Writable stream (for writing data)
  • Duplex stream (can be used for both reading and writing)
  • Transformation stream (a custom duplex stream used to process data, such as compressing, inspecting data, etc.)

The following sentence clearly explains why we should use streams.

An important goal of the Stream API (and in particular the stream.pipe() method) is to limit data buffering to an acceptable level, so that sources and destinations of different speeds do not clog available memory.

We need some way to get the job done without overwhelming the system. This is what we mentioned at the beginning of the article.

In the diagram above, we have two types of streams, readable streams and writable streams. The .pipe() method is a very basic method used to connect a readable stream to a writable stream. If you don't understand the diagram above, don't worry. After you look at our examples, you can come back to the diagram and everything will make sense. Pipes are a fascinating mechanism, and we'll use two examples to illustrate them.

Solution 1 (Simply use streams to copy files)

Let's design a solution to the large file copying problem mentioned above. First we create two flows and then follow the next few steps.

1. Listen for data chunks from a readable stream

2. Write the data block into the writable stream

3. Track the progress of file copying

We named this code streams_copy_basic.js

/*
    A file copy with streams and events - Author: Naren Arya
*/

const stream = require('stream');
const fs = require('fs');

let fileName = process.argv[2];
let destPath = process.argv[3];

const readabale = fs.createReadStream(fileName);
const writeable = fs.createWriteStream(destPath || "output");

fs.stat(fileName, (err, stats) => {
    this.fileSize = stats.size;
    this.counter = 1;
    this.fileArray = fileName.split('.');
    
    try {
        this.duplicate = destPath + "/" + this.fileArray[0] + '_Copy.' + this.fileArray[1];
    } catch(e) {
        console.exception('File name is invalid! please pass the proper one');
    }
    
    process.stdout.write(`File: ${this.duplicate} is being created:`);
    
    readabale.on('data', (chunk)=> {
        let percentageCopied = ((chunk.length * this.counter) / this.fileSize) * 100;
        process.stdout.clearLine(); // clear current text
        process.stdout.cursorTo(0);
        process.stdout.write(`${Math.round(percentageCopied)}%`);
        writeable.write(chunk);
        this.counter += 1;
    });
    
    readabale.on('end', (e) => {
        process.stdout.clearLine(); // clear current text
        process.stdout.cursorTo(0);
        process.stdout.write("Successfully finished the operation");
        return;
    });
    
    readabale.on('error', (e) => {
        console.log("Some error occurred: ", e);
    });
    
    writeable.on('finish', () => {
        console.log("Successfully created the file copy!");
    });
    
});

In this program, we receive two file paths (source file and target file) passed in by the user, and then create two streams to transfer data blocks from the readable stream to the writable stream. We then define some variables to track the progress of the file copy and then output it to the console (console in this case). At the same time we also subscribe to some events:

data: Triggered when a block of data is read

end: Triggered when a data block is read by the readable stream

error: Triggered when an error occurs while reading a data block

By running this program, we can successfully complete the task of copying a large file (7.4 G here).

$ time node streams_copy_basic.js cartoonMovie.mkv ~/Documents/4kdemo.mkv

However, when we observe the memory status of the program during operation through the task manager, there is still a problem.

4.6GB? The memory consumed by our program while it is running does not make sense here, and it is very likely to block other applications.

what happened?

If you look closely at the read and write rates in the above figure, you will find some clues.

Disk Read: 53.4 MiB/s

Disk Write: 14.8 MiB/s

This means that producers are producing at a faster rate and consumers are unable to keep up. To save the data block read, the computer stores the excess data in the machine's RAM. That's why there's a spike in RAM.

The above code runs in 3 minutes and 16 seconds on my machine...

17.16s user 25.06s system 21% cpu 3:16.61 total

Solution 2 (file copying based on streams and automatic back pressure)

To overcome the above problems, we can modify the program to automatically adjust the read and write speed of the disk. This mechanism is backpressure. We don’t need to do much, just import the readable stream into the writable stream, and NodeJS will take care of the backpressure.

Let's name this program streams_copy_efficient.js

/*
    A file copy with streams and piping - Author: Naren Arya
*/

const stream = require('stream');
const fs = require('fs');

let fileName = process.argv[2];
let destPath = process.argv[3];

const readabale = fs.createReadStream(fileName);
const writeable = fs.createWriteStream(destPath || "output");

fs.stat(fileName, (err, stats) => {
    this.fileSize = stats.size;
    this.counter = 1;
    this.fileArray = fileName.split('.');
    
    try {
        this.duplicate = destPath + "/" + this.fileArray[0] + '_Copy.' + this.fileArray[1];
    } catch(e) {
        console.exception('File name is invalid! please pass the proper one');
    }
    
    process.stdout.write(`File: ${this.duplicate} is being created:`);
    
    readabale.on('data', (chunk) => {
        let percentageCopied = ((chunk.length * this.counter) / this.fileSize) * 100;
        process.stdout.clearLine(); // clear current text
        process.stdout.cursorTo(0);
        process.stdout.write(`${Math.round(percentageCopied)}%`);
        this.counter += 1;
    });
    
    readabale.pipe(writeable); // Auto pilot ON!
    
    // In case if we have an interruption while copying
    writeable.on('unpipe', (e) => {
        process.stdout.write("Copy has failed!");
    });
    
});

In this example, we replaced the previous data block write operation with one line of code.

readabale.pipe(writeable); // Auto pilot ON!

The pipe here is where all the magic happens. It controls the speed of disk reads and writes so as not to clog the main memory (RAM).

Run it.

$ time node streams_copy_efficient.js cartoonMovie.mkv ~/Documents/4kdemo.mkv

We copied the same large file (7.4 GB) and let’s look at the memory utilization.

shock! Now the Node program only takes up 61.9 MiB of memory. If you observe the read and write rates:

Disk Read: 35.5 MiB/s

Disk Write: 35.5 MiB/s

At any given time, the read and write rates remain consistent due to back pressure. What’s even more surprising is that this optimized program code is 13 seconds faster than the previous one.

12.13s user 28.50s system 22% cpu 3:03.35 total

Thanks to NodeJS streams and pipes, the memory load was reduced by 98.68% and the execution time was also reduced. That's why the pipeline is a powerful presence.

61.9 MiB is the size of the buffer created by the readable stream. We can also allocate a custom size for the buffer chunk using the read method on the readable stream.

const readabale = fs.createReadStream(fileName);
readable.read(no_of_bytes_size);

In addition to local file copying, this technique can also be used to optimize many I/O operation problems:

  • Processing data flow from Kafka to database
  • Processes data streams from the file system, compresses them on the fly and writes them to disk
  • More……

in conclusion

The motivation for writing this article is mainly to illustrate that even if NodeJS provides a good API, we may accidentally write code with poor performance. If we could pay more attention to the tools built into it, we could better optimize how the program runs.

The above is the detailed content of how to use Node.js to write memory-efficient applications. For more information about Node.js, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of JavaScript's memory space, assignment, and deep and shallow copies
  • An article to understand javascript memory leaks
  • NodeJs high memory usage troubleshooting actual combat record
  • JavaScript garbage collection mechanism and memory management
  • Analysis of common JS memory leaks and solutions
  • Detailed explanation of javascript memory model example
  • Analysis of several examples of memory leaks caused by JS
  • Detailed explanation of JavaScript stack memory and heap memory
  • How to deal with JavaScript memory leaks
  • Detailed explanation of JS memory space

<<:  A Deeper Look at SQL Injection

>>:  How to configure SSL certificate in nginx to implement https service

Recommend

Reasons and solutions for MySQL sql_mode modification not taking effect

Table of contents Preface Scenario simulation Sum...

javascript Blob object to achieve file download

Table of contents illustrate 1. Blob object 2. Fr...

How to automatically execute SQL statements when MySQL in Docker starts

When creating a MySQL container with Docker, some...

Summary of several key points about mysql init_connect

The role of init_connect init_connect is usually ...

Install OpenSSL on Windows and use OpenSSL to generate public and private keys

1. OpenSSL official website Official download add...

XHTML: Frame structure tag

Frame structure tag <frameset></frameset...

Understand the initial use of redux in react in one article

Redux is a data state management plug-in. When us...

Introduction to the use of MySQL pt-slave-restart tool

Table of contents When setting up a MySQL master-...

MySQL 8.0.18 deployment and installation tutorial under Windows 7

1. Preliminary preparation (windows7+mysql-8.0.18...

Analysis of problems caused by MySQL case sensitivity

MYSQL is case sensitive Seeing the words is belie...

Detailed explanation of COLLATION examples in MySQL that you may have overlooked

Preface The string types of MySQL database are CH...

The concept of MTR in MySQL

MTR stands for Mini-Transaction. As the name sugg...

Nest.js hashing and encryption example detailed explanation

0x0 Introduction First of all, what is a hash alg...

Layim in javascript to find friends and groups

Currently, layui officials have not provided the ...