Nodejs module system source code analysis

Nodejs module system source code analysis

Overview

The emergence of Node.js allows front-end engineers to work across clients on the server. Of course, the birth of a new operating environment will also bring new modules, functions, or even ideological innovations. This article will lead readers to appreciate the module design ideas of Node.js (hereinafter referred to as Node) and analyze some core source code implementations.

CommonJS Specification

Node initially followed the CommonJS specification to implement its own module system, and at the same time made some customizations that were different from the specification. The CommonJS specification is a module format defined to solve the scope problem of JavaScript, which allows each module to execute in its own namespace.

This specification emphasizes that modules must export external variables or functions through module.exports, import the output of other modules into the current module scope through require(), and follow the following conventions:

  • In a module, you must expose a require variable, which is a function. The require function accepts a module identifier and requires returns the exported API of the external module. require must throw an error if the requested module cannot be returned.
  • In a module, there must be a free variable called exports, which is an object. When the module is executed, the module's attributes can be mounted on exports. Modules must use the exports object as their only export method.
  • In a module, there must be a free variable module, which is also an object. The module object must have an id attribute, which is the top-level id of the module. The id attribute must be such that require(module.id) will return the exports object from the module from which module.id originated (that is, module.id can be passed to another module, and must be returned to the original module when required).

Node's implementation of the CommonJS specification

The module.require function inside the module and the global require function are defined to load modules.

In the Node module system, each file is considered a separate module. When a module is loaded, it is initialized as an instance of the Module object. The basic implementation and properties of the Module object are as follows:

function Module(id = "", parent) {
  // Module id, usually the absolute path of the module this.id = id;
  this.path = path.dirname(id);
  this.exports = {};
  //Current module caller this.parent = parent;
  updateChildren(parent, this, false);
  this.filename = null;
  // Is the module loaded? this.loaded = false;
  //Module referenced by the current module this.children = [];
}

Each module exposes its exports attribute as a user interface.

Module exports and imports

In Node, you can use the module.exports object to export a variable or function as a whole, or you can mount the variable or function to be exported to the attributes of the exports object. The code is as follows:

// 1. Use exports: I usually use it to export tool library functions or constants exports.name = 'xiaoxiang';
exports.add = (a, b) => a + b;
// 2. Use module.exports: export an entire object or a single function...
module.exports = {
  add,
  minus
}

The module is referenced through the global require function. The module name, relative path or absolute path can be passed in. When the module file suffix is ​​js / json / node, the suffix can be omitted, as shown in the following code:

// Reference module const { add, minus } = require('./module');
const a = require('/usr/app/module');
const http = require('http');

Note:

The exports variable is available in the module's file-level scope and is assigned to module.exports before the module is executed.

exports.name = 'test';
console.log(module.exports.name); // test
module.export.name = 'test';
console.log(exports.name); // test

If exports is given a new value, it will no longer be bound to module.exports, and vice versa:

exports = { name: 'test' };
console.log(module.exports.name, exports.name); // undefined, test

]When the module.exports property is completely replaced by a new object, it is usually necessary to reassign exports as well:

module.exports = exports = { name: 'test' };
console.log(module.exports.name, exports.name) // test, test

Module system realizes analysis module positioning

The following is the code implementation of the require function:

// require entry function Module.prototype.require = function(id) {
  //...
  requireDepth++;
  try {
    return Module._load(id, this, /* isMain */ false); // Load module } finally {
    requireDepth--;
  }
};

The above code receives the given module path, where requireDepth is used to record the depth of module loading. The Module class method _load implements the main logic of Node loading modules. Let's parse the source code implementation of the Module._load function. For your convenience, I have added comments to the text.

Module._load = function(request, parent, isMain) {
  // Step 1: Resolve the full path of the module const filename = Module._resolveFilename(request, parent, isMain);

  // Step 2: Load the module, which is divided into three cases. // Case 1: If there is a cached module, directly return the exports property of the module const cachedModule = Module._cache[filename];
  if (cachedModule !== undefined) 
    return cachedModule.exports;
  // Case 2: Loading built-in modules const mod = loadNativeModule(filename, request);
  if (mod && mod.canBeRequiredByUsers) return mod.exports;
  // Case 3: Build module load const module = new Module(filename, parent);
  // After loading, cache the module instance Module._cache[filename] = module;

  // Step 3: Load the module file module.load(filename);

  // Step 4: Return the export object return module.exports;
};

Loading strategy

The above code contains a lot of information. We mainly look at the following issues:

What is the module's caching strategy? Analyzing the above code, we can see that the _load function gives different loading strategies for three situations, namely:

  • Case 1: Cache hit, return directly.
  • Case 2: Built-in module, returns the exposed exports property, which is an alias of module.exports.
  • Case 3: Use files or third-party code to generate modules, return them at the end, and cache them, so that the next time you access the same module, you will use the cache instead of reloading it.

How does Module._resolveFilename(request, parent, isMain) resolve the file name?

Let's look at the class method defined as follows:

Module._resolveFilename = function(request, parent, isMain, options) {
 if (NativeModule.canBeRequiredByUsers(request)) { 
     // Prioritize loading built-in modules return request;
 }
 let paths;

 // options used by node require.resolve function, options.paths is used to specify the search path if (typeof options === "object" && options !== null) {
   if (ArrayIsArray(options.paths)) {
     const isRelative =
       request.startsWith("./") ||
       request.startsWith("../") ||
       (isWindows && request.startsWith(".\\")) ||
       request.startsWith("..\\");
     if (isRelative) {
       paths = options.paths;
     } else {
       const fakeParent = new Module("", null);
       paths = [];
       for (let i = 0; i < options.paths.length; i++) {
         const path = options.paths[i];
         fakeParent.paths = Module._nodeModulePaths(path);
         const lookupPaths = Module._resolveLookupPaths(request, fakeParent);
         for (let j = 0; j < lookupPaths.length; j++) {
           if (!paths.includes(lookupPaths[j])) paths.push(lookupPaths[j]);
         }
       }
     }
   } else if (options.paths === undefined) {
     paths = Module._resolveLookupPaths(request, parent);
   } else {
        //...
   }
 } else {
   // Find the module existence path paths = Module._resolveLookupPaths(request, parent);
 }
 // Find the module path based on the given module and traversal address array, as well as whether it is an entry module const filename = Module._findPath(request, paths, isMain);
 if (!filename) {
   const requireStack = [];
   for (let cursor = parent; cursor; cursor = cursor.parent) {
     requireStack.push(cursor.filename || cursor.id);
   }
   // Module not found, throw an exception (is this a familiar error?)
   let message = `Cannot find module '${request}'`;
   if (requireStack.length > 0) {
     message = message + "\nRequire stack:\n- " + requireStack.join("\n- ");
   }

   const err = new Error(message);
   err.code = "MODULE_NOT_FOUND";
   err.requireStack = requireStack;
   throw err;
 }
 //Finally return the full path including the file name return filename;
};

The most prominent feature of the above code is the use of the _resolveLookupPaths and _findPath methods.

_resolveLookupPaths: Returns an array of traversal scopes used by _findPath by accepting a module name and a module caller.

// Module file addressing address array method Module._resolveLookupPaths = function(request, parent) {
    if (NativeModule.canBeRequiredByUsers(request)) {
      debug("looking for %j in []", request);
      return null;
    }

    // If it is not a relative path if (
      request.charAt(0) !== "." ||
      (request.length > 1 &&
        request.charAt(1) !== "." &&
        request.charAt(1) !== "/" &&
        (!isWindows || request.charAt(1) !== "\\"))
    ) {
      /** 
       * Check the node_modules folder * modulePaths is the user directory, the node_path environment variable specifies the directory, the global node installation directory */
      let paths = modulePaths;

      if (parent != null && parent.paths && parent.paths.length) {
        // The modulePath of the parent module should also be added to the modulePath of the child module, and then trace back to find paths = parent.paths.concat(paths);
      }

      return paths.length > 0 ? paths : null;
    }

    // When using repl interaction, search for ./ ./node_modules and modulePaths in turn
    if (!parent || !parent.id || !parent.filename) {
      const mainPaths = ["."].concat(Module._nodeModulePaths("."), modulePaths);

      return mainPaths;
    }

    // If it is a relative path introduction, add the parent folder path to the search path const parentDir = [path.dirname(parent.filename)];
    return parentDir;
   };

_findPath: Find the corresponding filename and return it based on the target module and the range found by the above function.

// Find the real path of the module based on the given module and traversal address array, as well as whether it is a top-level module Module._findPath = function(request, paths, isMain) {
 const absoluteRequest = path.isAbsolute(request);
 if (absoluteRequest) {
  // Absolute path, directly locate the specific module paths = [""];
 } else if (!paths || paths.length === 0) {
   return false;
 }
 const cacheKey =
   request + "\x00" + (paths.length === 1 ? paths[0] : paths.join("\x00"));
 // Cache path const entry = Module._pathCache[cacheKey];
 if (entry) return entry;
 let exts;
 let trailingSlash =
   request.length > 0 &&
   request.charCodeAt(request.length - 1) === CHAR_FORWARD_SLASH; // '/'
 if (!trailingSlash) {
   trailingSlash = /(?:^|\/)\.?\.$/.test(request);
 }
 // For each path
 for (let i = 0; i < paths.length; i++) {
   const curPath = paths[i];
   if (curPath && stat(curPath) < 1) continue;
   const basePath = resolveExports(curPath, request, absoluteRequest);
   let filename;
   const rc = stat(basePath);
   if (!trailingSlash) {
     if (rc === 0) { // stat status returns 0, then it is a file // File.
       if (!isMain) {
         if (preserveSymlinks) {
           // Instruct the module loader to maintain symbolic links when resolving and caching modules.
           filename = path.resolve(basePath);
         } else {
           // Do not keep symbolic links filename = toRealPath(basePath);
         }
       } else if (preserveSymlinksMain) {
         filename = path.resolve(basePath);
       } else {
         filename = toRealPath(basePath);
       }
     }
     if (!filename) {
       if (exts === undefined) exts = ObjectKeys(Module._extensions);
       // Parse the suffix filename = tryExtensions(basePath, exts, isMain);
     }
   }
   if (!filename && rc === 1) { 
     /** 
       * If stat returns 1 and the file name does not exist, it is considered a folder * If the file suffix does not exist, try to load the file specified by the main entry in package.json under the directory * If it does not exist, then try index[.js, .node, .json] file */
     if (exts === undefined) exts = ObjectKeys(Module._extensions);
     filename = tryPackage(basePath, exts, isMain, request);
   }
   if (filename) { // If the file exists, add the file name to the cache Module._pathCache[cacheKey] = filename;
     return filename;
   }
 }
 const selfFilename = trySelf(paths, exts, isMain, trailingSlash, request);
 if (selfFilename) {
   // Set the path cache Module._pathCache[cacheKey] = selfFilename;
   return selfFilename;
 }
 return false;
};

Module loading

Standard module processing

After reading the above code, we find that when the module is a folder, the logic of the tryPackage function will be executed. The following is a brief analysis of the specific implementation.

// Try to load a standard module function tryPackage(requestPath, exts, isMain, originalPath) {
  const pkg = readPackageMain(requestPath);
  if (!pkg) {
    // If there is no package.json, index is used as the default entry file return tryExtensions(path.resolve(requestPath, "index"), exts, isMain);
  }
  const filename = path.resolve(requestPath, pkg);
  let actual =
    tryFile(filename, isMain) ||
    tryExtensions(filename, exts, isMain) ||
    tryExtensions(path.resolve(filename, "index"), exts, isMain);
  //...
  return actual;
}
// Read the main field in package.json function readPackageMain(requestPath) {
  const pkg = readPackage(requestPath);
  return pkg ? pkg.main : undefined;
}

The readPackage function is responsible for reading and parsing the contents of the package.json file, as described below:

function readPackage(requestPath) {
  const jsonPath = path.resolve(requestPath, "package.json");
  const existing = packageJsonCache.get(jsonPath);
  if (existing !== undefined) return existing;
  // Call libuv uv_fs_open execution logic, read package.json file, and cache const json = internalModuleReadJSON(path.toNamespacedPath(jsonPath));
  if (json === undefined) {
    // Then cache the file packageJsonCache.set(jsonPath, false);
    return false;
  }
  //...
  try {
    const parsed = JSONParse(json);
    const filtered = {
      name: parsed.name,
      main: parsed.main,
      exports: parsed.exports,
      type: parsed.type
    };
    packageJsonCache.set(jsonPath, filtered);
    return filtered;
  } catch (e) {
    //...
  }
}

The above two code snippets perfectly explain the role of the package.json file, the configuration entry of the module (the main field in package.json), and why the default file of the module is index. The specific process is shown in the figure below:

Module file processing

After locating the corresponding module, how to load and parse it? The following is a specific code analysis:

Module.prototype.load = function(filename) {
  // Ensure that the module has not been loaded assert(!this.loaded);
  this.filename = filename;
  // Find the node_modules of the current folder
  this.paths = Module._nodeModulePaths(path.dirname(filename));
  const extension = findLongestRegisteredExtension(filename);
  //...
  // Execute specific file extension parsing function such as js / json / node
  Module._extensions[extension](this, filename);
  // Indicates that the module was loaded successfully this.loaded = true;
  // ... omit esm module support };

Suffix processing

It can be seen that Node.js loads differently for different file suffixes. The following is a simple analysis of .js, .json, and .node.

The reading of js files with .js suffix is ​​mainly implemented through Node's built-in API fs.readFileSync.

Module._extensions[".js"] = function(module, filename) {

  // Read file content const content = fs.readFileSync(filename, "utf8");
  // Compile and execute code module._compile(content, filename);
};

The processing logic of JSON files with json suffix is ​​relatively simple. After reading the file content, execute JSONParse to get the result.

Module._extensions[".json"] = function(module, filename) {
  // Load the file directly in utf-8 format const content = fs.readFileSync(filename, "utf8");
  //...
  try {
    // Export file contents in JSON object format module.exports = JSONParse(stripBOM(content));
  } catch (err) {
    //...
  }
};

The .node file with suffix .node is a native module implemented by C/C++ and is read by the process.dlopen function. The process.dlopen function actually calls the DLOpen function in the C++ code, and DLOpen calls uv_dlopen, which loads the .node file, similar to the OS loading system library files.

Module._extensions[".node"] = function(module, filename) {
  //...
  return process.dlopen(module, path.toNamespacedPath(filename));
};

From the three source codes above, we can see and understand that only the JS suffix will execute the instance method _compile in the end. Let's remove some experimental features and debugging-related logic to briefly analyze this code.

Compile and execute

After the module is loaded, Node uses the method provided by the V8 engine to build and run the sandbox and execute the function code. The code is as follows:

Module.prototype._compile = function(content, filename) {
  let moduleURL;
  let redirects;
  // Inject public variables __dirname / __filename / module / exports / require into the module, and compile the function const compiledWrapper = wrapSafe(filename, content, this);
  const dirname = path.dirname(filename);
  const require = makeRequireFunction(this, redirects);
  let result;
  const exports = this.exports;
  const thisValue = exports;
  const module = this;
  if (requireDepth === 0) statCache = new Map();
      //...
   // Execute the function in the module result = compiledWrapper.call(
      thisValue,
      exports,
      require,
      module,
      filename,
      dirname
    );
  hasLoadedAnyUserCJSModule = true;
  if (requireDepth === 0) statCache = null;
  return result;
};
//Core logic of injecting variables function wrapSafe(filename, content, cjsModuleInstance) {
  if (patched) {
    const wrapper = Module.wrap(content);
    // vm sandbox runs and returns the running result directly, env->SetProtoMethod(script_tmpl, "runInThisContext", RunInThisContext);
    return vm.runInThisContext(wrapper, {
      filename,
      lineOffset: 0,
      displayErrors: true,
      // Dynamically load importModuleDynamically: async specifier => {
        const loader = asyncESM.ESMLoader;
        return loader.import(specifier, normalizeReferrerURL(filename));
      }
    });
  }
  let compiled;
  try {
    compiled = compileFunction(
      content,
      filename,
      0,
      0,
      undefined,
      false,
      undefined,
      [],
      ["exports", "require", "module", "__filename", "__dirname"]
    );
  } catch (err) {
    //...
  }
  const { callbackMap } = internalBinding("module_wrap");
  callbackMap.set(compiled.cacheKey, {
    importModuleDynamically: async specifier => {
      const loader = asyncESM.ESMLoader;
      return loader.import(specifier, normalizeReferrerURL(filename));
    }
  });
  return compiled.function;
}

In the above code, we can see that the wrapwrapSafe function is called in the _compile function, the injection of the __dirname / __filename / module / exports / require public variables is performed, and the C++ runInThisContext method (located in the src/node_contextify.cc file) is called to build a sandbox environment for the module code to run, and the compiledWrapper object is returned. Finally, the module is run through the compiledWrapper.call method.

The above is the detailed content of the source code analysis of the nodejs module system. For more information about the source code analysis of the nodejs module system, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Detailed explanation of NodeJS module and ES6 module system syntax and points to note
  • Detailed explanation of using vscode+es6 to write nodejs server debugging configuration
  • Let nodeJS support ES6 lexical ---- installation and use of babel
  • Detailed explanation of how to use ES6 elegantly in NodeJS projects
  • Detailed explanation of nodejs built-in modules
  • A brief discussion on event-driven development in JS and Nodejs
  • How to use module fs file system in Nodejs
  • Detailed explanation of how Node.js handles ES6 modules

<<:  A brief discussion on MySQL B-tree index and index optimization summary

>>:  Detailed usage of kubernetes object Volume

Recommend

JavaScript implements random generation of verification code and verification

This article shares the specific code of JavaScri...

VMware Workstation installation Linux system

From getting started to becoming a novice, the Li...

Building FastDFS file system in Docker (multi-image tutorial)

Table of contents About FastDFS 1. Search for ima...

Ubuntu 18.04 disable/enable touchpad via command

In Ubuntu, you often encounter the situation wher...

Detailed explanation of MySQL DEFINER usage

Table of contents Preface: 1.Brief introduction t...

How to use yum to configure lnmp environment in CentOS7.6 system

1. Installation version details Server: MariaDB S...

Implementation of debugging code through nginx reverse proxy

background Now the company's projects are dev...

What are the differences between xHTML and HTML tags?

All tags must be lowercase In XHTML, all tags must...

Detailed process of installing and deploying onlyoffice in docker

0. System requirements CPU I5-10400F or above Mem...

Correct steps to install Nginx in Linux

Preface If you are like me, as a hard-working Jav...

Common structural tags in XHTML

structure body, head, html, title text abbr, acro...

Detailed explanation of json file writing format

Table of contents What is JSON Why this technolog...

About WeChat Mini Program to implement cloud payment

Table of contents 1. Introduction 2. Thought Anal...