OverviewFor many years, Node.js was not the best choice for implementing highly CPU-intensive applications, mainly due to JavaScript's single-threaded nature. As a solution to this problem, Node.js v10.5.0 introduced the experimental concept of "worker threads" through the worker_threads module, and it became a stable feature starting from Node.js v12 LTS. This article explains how it works and how to get the best performance from using Worker threads. The history of CPU-bound applications in Node.jsBefore worker threads, there were multiple ways to execute CPU-intensive applications in Node.js. Some of them are:
However, due to performance limitations, additional complexity, low adoption, and poor documentation, none of these solutions have been widely adopted. Use worker threads for CPU-intensive operationsAlthough worker_threads is an elegant solution to JavaScript's concurrency problem, it does not bring multithreading features to JavaScript itself. In contrast, worker_threads achieves concurrency by running your application using multiple, isolated JavaScript workers, with communication between workers and the parent worker provided by Node. Are you confused? ♂️ In Node.js, each worker will have its own V8 instance and event loop. But unlike child_process, workers do not share memory. The above concepts will be explained later. Let's first take a quick look at how to use Worker threads. A naive use case would look like this: // worker-simple.js const {Worker, isMainThread, parentPort, workerData} = require('worker_threads'); if (isMainThread) { const worker = new Worker(__filename, {workerData: {num: 5}}); worker.once('message', (result) => { console.log('square of 5 is :', result); }) } else { parentPort.postMessage(workerData.num * workerData.num) } In the example above, we passed a number to each individual worker to calculate its square value. After computation, the child worker sends the result back to the main worker thread. Despite its apparent simplicity, it can be a bit confusing for people new to Node.js. How do Worker threads work?The JavaScript language does not have multithreading features. Therefore, Node.js Worker threads behave in a different way than traditional multithreading in many other high-level languages. In Node.js, the responsibility of a worker is to execute a piece of code (worker script) provided by the parent worker. This worker script will run in isolation from other workers and will be able to pass messages between itself and the parent worker. The worker script can be either a standalone file or a text script that can be parsed by eval. In our case, we use __filename as the worker script because both the parent and child worker code are in the same script file, with the isMainThread property determining its role. Each worker is connected to its parent worker through a message channel. The child worker can use the parentPort.postMessage() function to write messages to the message channel, and the parent worker writes messages to the message channel by calling the worker.postMessage() function on the worker instance. Take a look at Figure 1: A Message Channel is a simple communication channel, with two ends called 'ports'. In JavaScript/NodeJS terminology, the two ends of a Message Channel are called port1 and port2. How do Node.js workers work in parallel?Now comes the key question. JavaScript does not directly provide concurrency, so how can two Node.js workers run in parallel? The answer is V8 isolate. A V8 isolate is a separate instance of the chrome V8 runtime, with its own JS stack and a microtask queue. This allows each Node.js worker to run its JavaScript code in complete isolation from other workers. The disadvantage is that workers cannot directly access other workers' heap data. Further reading: How does JS work in browsers and Node? Thus, each worker will have its own copy of the libuv event loop independent of the parent worker and other workers. Crossing the JS/C++ boundaryInstantiating a new worker and providing communication with parent/sibling JS scripts are all done by the C++ version of the worker. At the time of writing, the implementation is worker.cc (https://github.com/nodejs/node/blob/921493e228/src/node_worker.cc). Worker implementations are exposed as user-level JavaScript scripts via the worker_threads module. The JS implementation is split into two scripts, which I will refer to as:
Figure 2 explains this process in a clearer way: Based on the above, we can divide the worker setup process into two stages:
Let’s take a look at what happens at each stage: Initialization steps1. The user-level script creates a worker instance by using worker_threads 2.Node's parent worker initialization script calls C++ and creates an empty worker object. At this point, the created worker is just a simple C++ object that has not been started. 3. When the C++ worker object is created, it generates a thread ID and assigns it to itself 4. At the same time, an empty initialization message channel (let's call it IMC) is created by the parent worker. This is shown in the grey “Initialisation Message Channel” section in Figure 2. 5. A public JS message channel (called PMC) is created by the worker initialization script. This channel is used by user-level JS to pass messages between parent and child workers. This part is mainly described in Figure 1 and is also marked in red in Figure 2. 6. The Node parent worker initialization script calls C++ and writes the initial metadata that needs to be sent to the worker execution script to the IMC. What is initial metadata? That is, the data that the script needs to know to start the worker, including the script name, worker data, port2 of the PMC, and some other information. In our example, the initialization metadata is as follows: :phone: Hey! The worker executes the script. Could you please run worker-simple.js with worker data like {num: 5}? Please also pass the PMC's port2 to it, so that the worker can read data from the PMC. The following snippet shows how initialization data is written to the IMC: const kPublicPort = Symbol('kPublicPort'); // ... const { port1, port2 } = new MessageChannel(); this[kPublicPort] = port1; this[kPublicPort].on('message', (message) => this.emit('message', message)); // ... this[kPort].postMessage({ type: 'loadScript', filename, doEval: !!options.eval, cwdCounter: cwdCounter || workerIo.sharedCwdCounter, workerData: options.workerData, publicPort: port2, // ... hasStdin: !!options.stdin }, [port2]); This[kPort] in the code is the endpoint of the IMC in the initialization script. Although the worker initialization script writes data to the IMC, the worker execution script cannot access that data. Run stepsAt this point, initialization is complete; next the worker initialization script calls C++ and starts the worker thread. 1. A new V8 isolate is created and assigned to the worker. As mentioned earlier, a "v8 isolate" is a separate instance of the chrome V8 runtime. This isolates the worker thread's execution context from the rest of your application code. 2.libuv is initialized. This ensures that the worker thread maintains its own event loop independent of the rest of the application. 3. The worker execution script is executed and the worker's event loop is started. 4. The worker executes the script calling C++ and reads the initialization metadata from the IMC. 5. The worker executes the script and executes the corresponding file or code (worker-simple.js in our case) to start running as a worker. Take a look at the following code snippet to see how the worker execution script reads data from the IMC: const publicWorker = require('worker_threads'); // ... port.on('message', (message) => { if (message.type === 'loadScript') { const { cwdCounter, filename, doEval, workerData, publicPort, manifestSrc, manifestURL, hasStdin } = message; // ... initializeCJSLoader(); initializeESMLoader(); publicWorker.parentPort = publicPort; publicWorker.workerData = workerData; // ... port.unref(); port.postMessage({ type: UP_AND_RUNNING }); if (doEval) { const { evalScript } = require('internal/process/execution'); evalScript('[worker eval]', filename); } else { process.argv[1] = filename; // script filename require('module').runMain(); } } // ... Did you notice in the above snippet that the workerData and parentPort properties are assigned to the publicWorker object? The latter is introduced by require('worker_threads') in the worker execution script. That's why the workerData and parentPort properties are only available inside the child worker thread, but not in the parent worker's code. If you try to access either property in the parent worker code, null will be returned. Make full use of worker threadsNow that we understand how Node.js worker threads work, this can indeed help us get the best performance when using worker threads. When writing applications more complex than worker-simple.js, there are two main concerns to keep in mind: Although worker threads are more lightweight than true processes, it can still be expensive to frequently put workers into some heavy work. It is still not cost-effective to use worker threads to handle parallel I/O operations, because the native I/O mechanism of Node.js is a faster way than starting a worker thread from scratch to do the same thing. To overcome the problem in point 1, we need to implement a "worker thread pool". Worker thread poolThe Node.js worker thread pool is a set of worker threads that are running and can be used by subsequent tasks. When a new task arrives, it can be passed to an available worker via the parent-child message channel. Once the task is completed, the child worker can communicate the result back to the parent worker through the same message channel. When implemented properly, thread pools can significantly improve performance by reducing the overhead of creating new threads. It's also worth noting that since the number of parallel threads that can be effectively run is always limited by the hardware, creating a huge number of threads is also unlikely to work well. The following figure is a performance comparison of three Node.js servers, all of which receive a string and return a Bcrypt hash with 12 rounds of salting. The three servers are:
At a glance, it can be seen that using a thread pool has significantly less overhead as the load grows. However, as of the time of writing, thread pools are not a native feature of Node.js out of the box. Therefore, you still have to rely on third-party implementations or write your own worker pool. Hopefully you now have a good understanding of how worker threads work and can start experimenting and leveraging worker threads to write your CPU-bound applications. The above is the detailed content of in-depth understanding of Worker threads in Node.js. For more information about Node.js, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Take you to understand MySQL character set settings in 5 minutes
>>: How to move a red rectangle with the mouse in Linux character terminal
Convert code to image using html2canvas is a very...
MySQL master-slave configuration and principle, f...
This article example shares the specific code of ...
1. Create a project with vue ui 2. Select basic c...
The day before yesterday, I encountered a problem...
Table of contents Current Issues Solution process...
In the previous chapters, we have learned how to ...
Table of contents Preface What is a virtual list?...
1. Download MySQL Workbench Workbench is a graphi...
Table of contents Write docker-compose.yml Run do...
accomplish This effect is difficult to replicate ...
The excellence of Linux lies in its multi-user, m...
<br />This site’s original content, please i...
When multiple images are introduced into a page, ...
The default firewall of CentOS7 is not iptables, ...