In-depth understanding of Worker threads in Node.js

In-depth understanding of Worker threads in Node.js

Overview

For many years, Node.js was not the best choice for implementing highly CPU-intensive applications, mainly due to JavaScript's single-threaded nature. As a solution to this problem, Node.js v10.5.0 introduced the experimental concept of "worker threads" through the worker_threads module, and it became a stable feature starting from Node.js v12 LTS. This article explains how it works and how to get the best performance from using Worker threads.

The history of CPU-bound applications in Node.js

Before worker threads, there were multiple ways to execute CPU-intensive applications in Node.js. Some of them are:

  • Use the child_process module and run CPU-intensive code in a child process
  • Use the cluster module to run multiple CPU-intensive operations in multiple processes
  • Using third-party modules such as Microsoft's Napa.js

However, due to performance limitations, additional complexity, low adoption, and poor documentation, none of these solutions have been widely adopted.

Use worker threads for CPU-intensive operations

Although worker_threads is an elegant solution to JavaScript's concurrency problem, it does not bring multithreading features to JavaScript itself. In contrast, worker_threads achieves concurrency by running your application using multiple, isolated JavaScript workers, with communication between workers and the parent worker provided by Node. Are you confused? ‍♂️

In Node.js, each worker will have its own V8 instance and event loop. But unlike child_process, workers do not share memory.

The above concepts will be explained later. Let's first take a quick look at how to use Worker threads. A naive use case would look like this:

// worker-simple.js

const {Worker, isMainThread, parentPort, workerData} = require('worker_threads');
if (isMainThread) {
 const worker = new Worker(__filename, {workerData: {num: 5}});
 worker.once('message', (result) => {
 console.log('square of 5 is :', result);
 })
} else {
 parentPort.postMessage(workerData.num * workerData.num)
}

In the example above, we passed a number to each individual worker to calculate its square value. After computation, the child worker sends the result back to the main worker thread. Despite its apparent simplicity, it can be a bit confusing for people new to Node.js.

How do Worker threads work?

The JavaScript language does not have multithreading features. Therefore, Node.js Worker threads behave in a different way than traditional multithreading in many other high-level languages.

In Node.js, the responsibility of a worker is to execute a piece of code (worker script) provided by the parent worker. This worker script will run in isolation from other workers and will be able to pass messages between itself and the parent worker. The worker script can be either a standalone file or a text script that can be parsed by eval. In our case, we use __filename as the worker script because both the parent and child worker code are in the same script file, with the isMainThread property determining its role.

Each worker is connected to its parent worker through a message channel. The child worker can use the parentPort.postMessage() function to write messages to the message channel, and the parent worker writes messages to the message channel by calling the worker.postMessage() function on the worker instance. Take a look at Figure 1:

A Message Channel is a simple communication channel, with two ends called 'ports'. In JavaScript/NodeJS terminology, the two ends of a Message Channel are called port1 and port2.

How do Node.js workers work in parallel?

Now comes the key question. JavaScript does not directly provide concurrency, so how can two Node.js workers run in parallel? The answer is V8 isolate.

A V8 isolate is a separate instance of the chrome V8 runtime, with its own JS stack and a microtask queue. This allows each Node.js worker to run its JavaScript code in complete isolation from other workers. The disadvantage is that workers cannot directly access other workers' heap data.

Further reading: How does JS work in browsers and Node?

Thus, each worker will have its own copy of the libuv event loop independent of the parent worker and other workers.

Crossing the JS/C++ boundary

Instantiating a new worker and providing communication with parent/sibling JS scripts are all done by the C++ version of the worker. At the time of writing, the implementation is worker.cc (https://github.com/nodejs/node/blob/921493e228/src/node_worker.cc).

Worker implementations are exposed as user-level JavaScript scripts via the worker_threads module. The JS implementation is split into two scripts, which I will refer to as:

  • Initialization script worker.js— Responsible for initializing the worker instance and establishing the initial parent-child worker communication to ensure that worker metadata is passed from the parent worker to the child worker. (https://github.com/nodejs/node/blob/921493e228/lib/internal/worker.js)
  • Execute the script worker_thread.js — Execute the user's worker JS script according to the workerData data provided by the user and other metadata provided by the parent worker. (https://github.com/nodejs/node/blob/921493e228/lib/internal/main/worker_thread.js)

Figure 2 explains this process in a clearer way:

Based on the above, we can divide the worker setup process into two stages:

  • Worker initialization
  • Running the worker

Let’s take a look at what happens at each stage:

Initialization steps

1. The user-level script creates a worker instance by using worker_threads

2.Node's parent worker initialization script calls C++ and creates an empty worker object. At this point, the created worker is just a simple C++ object that has not been started.

3. When the C++ worker object is created, it generates a thread ID and assigns it to itself

4. At the same time, an empty initialization message channel (let's call it IMC) is created by the parent worker. This is shown in the grey “Initialisation Message Channel” section in Figure 2.

5. A public JS message channel (called PMC) is created by the worker initialization script. This channel is used by user-level JS to pass messages between parent and child workers. This part is mainly described in Figure 1 and is also marked in red in Figure 2.

6. The Node parent worker initialization script calls C++ and writes the initial metadata that needs to be sent to the worker execution script to the IMC.

What is initial metadata? That is, the data that the script needs to know to start the worker, including the script name, worker data, port2 of the PMC, and some other information.

In our example, the initialization metadata is as follows:

:phone: Hey! The worker executes the script. Could you please run worker-simple.js with worker data like {num: 5}? Please also pass the PMC's port2 to it, so that the worker can read data from the PMC.

The following snippet shows how initialization data is written to the IMC:

const kPublicPort = Symbol('kPublicPort');
// ...

const { port1, port2 } = new MessageChannel();
this[kPublicPort] = port1;
this[kPublicPort].on('message', (message) => this.emit('message', message));
// ...

this[kPort].postMessage({
  type: 'loadScript',
  filename,
  doEval: !!options.eval,
  cwdCounter: cwdCounter || workerIo.sharedCwdCounter,
  workerData: options.workerData,
  publicPort: port2,
  // ...
  hasStdin: !!options.stdin
}, [port2]);

This[kPort] in the code is the endpoint of the IMC in the initialization script. Although the worker initialization script writes data to the IMC, the worker execution script cannot access that data.

Run steps

At this point, initialization is complete; next the worker initialization script calls C++ and starts the worker thread.

1. A new V8 isolate is created and assigned to the worker. As mentioned earlier, a "v8 isolate" is a separate instance of the chrome V8 runtime. This isolates the worker thread's execution context from the rest of your application code.

2.libuv is initialized. This ensures that the worker thread maintains its own event loop independent of the rest of the application.

3. The worker execution script is executed and the worker's event loop is started.

4. The worker executes the script calling C++ and reads the initialization metadata from the IMC.

5. The worker executes the script and executes the corresponding file or code (worker-simple.js in our case) to start running as a worker.

Take a look at the following code snippet to see how the worker execution script reads data from the IMC:

const publicWorker = require('worker_threads');

// ...

port.on('message', (message) => {
  if (message.type === 'loadScript') {
    const {
      cwdCounter,
      filename,
      doEval,
      workerData,
      publicPort,
      manifestSrc,
      manifestURL,
      hasStdin
    } = message;

    // ...
    initializeCJSLoader();
    initializeESMLoader();
    
    publicWorker.parentPort = publicPort;
    publicWorker.workerData = workerData;

    // ...
    
    port.unref();
    port.postMessage({ type: UP_AND_RUNNING });
    if (doEval) {
      const { evalScript } = require('internal/process/execution');
      evalScript('[worker eval]', filename);
    } else {
      process.argv[1] = filename; // script filename
      require('module').runMain();
    }
  }
  // ...

Did you notice in the above snippet that the workerData and parentPort properties are assigned to the publicWorker object? The latter is introduced by require('worker_threads') in the worker execution script.

That's why the workerData and parentPort properties are only available inside the child worker thread, but not in the parent worker's code.

If you try to access either property in the parent worker code, null will be returned.

Make full use of worker threads

Now that we understand how Node.js worker threads work, this can indeed help us get the best performance when using worker threads. When writing applications more complex than worker-simple.js, there are two main concerns to keep in mind:

Although worker threads are more lightweight than true processes, it can still be expensive to frequently put workers into some heavy work.

It is still not cost-effective to use worker threads to handle parallel I/O operations, because the native I/O mechanism of Node.js is a faster way than starting a worker thread from scratch to do the same thing.

To overcome the problem in point 1, we need to implement a "worker thread pool".

Worker thread pool

The Node.js worker thread pool is a set of worker threads that are running and can be used by subsequent tasks. When a new task arrives, it can be passed to an available worker via the parent-child message channel. Once the task is completed, the child worker can communicate the result back to the parent worker through the same message channel.

When implemented properly, thread pools can significantly improve performance by reducing the overhead of creating new threads. It's also worth noting that since the number of parallel threads that can be effectively run is always limited by the hardware, creating a huge number of threads is also unlikely to work well.

The following figure is a performance comparison of three Node.js servers, all of which receive a string and return a Bcrypt hash with 12 rounds of salting. The three servers are:

  • No multithreading
  • Multithreading, no thread pool
  • A thread pool with 4 threads

At a glance, it can be seen that using a thread pool has significantly less overhead as the load grows.

However, as of the time of writing, thread pools are not a native feature of Node.js out of the box. Therefore, you still have to rely on third-party implementations or write your own worker pool.

Hopefully you now have a good understanding of how worker threads work and can start experimenting and leveraging worker threads to write your CPU-bound applications.

The above is the detailed content of in-depth understanding of Worker threads in Node.js. For more information about Node.js, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Javascript Web Worker using process parsing
  • Detailed explanation of Yii2 combined with Workerman's websocket example
  • Research on Web worker multithreading API in JavaScript
  • Detailed example of sharedWorker in JavaScript to achieve multi-page communication
  • How to use worker_threads to create new threads in nodejs
  • Javascript Worker sub-thread code example
  • Understanding the worker event API in JavaScript
  • How to use webWorker in JS

<<:  Take you to understand MySQL character set settings in 5 minutes

>>:  How to move a red rectangle with the mouse in Linux character terminal

Recommend

How to use html2canvas to convert HTML code into images

Convert code to image using html2canvas is a very...

MySQL master-slave principle and configuration details

MySQL master-slave configuration and principle, f...

JavaScript to achieve a simple countdown effect

This article example shares the specific code of ...

Detailed explanation of the adaptive adaptation problem of Vue mobile terminal

1. Create a project with vue ui 2. Select basic c...

Detailed explanation of the use of Join in Mysql

In the previous chapters, we have learned how to ...

Application examples of WeChat applet virtual list

Table of contents Preface What is a virtual list?...

MySQL Workbench download and use tutorial detailed explanation

1. Download MySQL Workbench Workbench is a graphi...

Implementation steps for docker-compose to deploy etcd cluster

Table of contents Write docker-compose.yml Run do...

CSS cleverly uses gradients to achieve advanced background light animation

accomplish This effect is difficult to replicate ...

A detailed introduction to Linux file permissions

The excellence of Linux lies in its multi-user, m...

XHTML Tutorial: XHTML Basics for Beginners

<br />This site’s original content, please i...

How to open the port in Centos7

The default firewall of CentOS7 is not iptables, ...