How to implement parallel downloading of large files in JavaScript

How to implement parallel downloading of large files in JavaScript

I believe some of you already know the solution for uploading large files. When uploading large files, in order to improve the upload efficiency, we generally use the Blob.slice method to cut the large file according to the specified size, and then start multi-threading to upload it in blocks. After all blocks are successfully uploaded, notify the server to merge the blocks.

So can we adopt a similar idea for large file downloads? If the server supports the Range request header, we can also implement multi-threaded block downloading, as shown in the following figure:

After reading the above picture, I believe you have a certain understanding of the solution for downloading large files. Next, let's introduce HTTP range requests.

1. HTTP Range Request

The HTTP protocol range request allows the server to send only a portion of an HTTP message to the client. Range requests are useful when transferring large media files or when used in conjunction with the resume feature of a file download. If the Accept-Ranges header is present in the response (and its value is not "none"), then this indicates that the server supports range requests.

In a Range header, you can request multiple parts at once, and the server will return them in the form of multipart files. If the server returns a range response, the 206 Partial Content status code should be used. If the requested range is invalid, the server will return a 416 Range Not Satisfiable status code, indicating a client error. The server is allowed to ignore the Range header and return the entire file with a status code of 200.

1.1 Range Syntax

Range: <unit>=<range-start>-
Range: <unit>=<range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>
Range: <unit>=<range-start>-<range-end>, <range-start>-<range-end>, <range-start>-<range-end>
  • unit: The unit used for range requests, usually bytes.
  • <range-start>: An integer indicating the starting value of the range in a specific unit.
  • <range-end>: An integer indicating the end value of the range in a specific unit. This value is optional; if absent, it means the range extends to the end of the document.

After understanding the Range syntax, let's take a look at an actual usage example:

1.1.1 Single category

$ curl http://i.imgur.com/z4d4kWk.jpg -i -H "Range: bytes=0-1023"

1.1.2 Multiple Ranges

$ curl http://www.example.com -i -H "Range: bytes=0-50, 100-150"

Well, that's all the knowledge about HTTP range requests. Now let's get down to business and start introducing how to download large files.

2. How to download large files

In order to help you better understand the following content, let's take a look at the overall flowchart:

After understanding the process of large file downloading, let's first define some auxiliary functions involved in the above process.

2.1 Defining auxiliary functions

2.1.1 Define the getContentLength function

As the name suggests, the getContentLength function is used to obtain the length of the file. In this function, we send a HEAD request and then read the Content-Length information from the response header to obtain the content length of the file corresponding to the current URL.

function getContentLength(url) {
  return new Promise((resolve, reject) => {
    let xhr = new XMLHttpRequest();
    xhr.open("HEAD", url);
    xhr.send();
    xhr.onload = function () {
      resolve(
        ~~xhr.getResponseHeader("Content-Length")
       );
    };
    xhr.onerror = reject;
  });
}

2.1.2 Define asyncPool function How to implement concurrency control in JavaScript? In this article, we introduced the asyncPool function, which is used to implement concurrency control of asynchronous tasks. This function receives 3 parameters:

  • poolLimit (number type): indicates the number of concurrent connections to be limited;
  • array (array type): represents a task array;
  • iteratorFn (function type): represents the iteration function, which is used to process each task item. The function returns a Promise object or an asynchronous function.
async function asyncPool(poolLimit, array, iteratorFn) {
  const ret = []; // Store all asynchronous tasks const executing = []; // Store the asynchronous tasks being executed for (const item of array) {
    // Call iteratorFn function to create asynchronous task const p = Promise.resolve().then(() => iteratorFn(item, array));
    ret.push(p); // Save new asynchronous tasks // When the poolLimit value is less than or equal to the total number of tasks, perform concurrency control if (poolLimit <= array.length) {
      // When the task is completed, remove the completed task from the array of executing tasks const e = p.then(() => executing.splice(executing.indexOf(e), 1));
      executing.push(e); // Save the executing asynchronous task if (executing.length >= poolLimit) {
        await Promise.race(executing); // Wait for the faster task to complete }
    }
  }
  return Promise.all(ret);
}

2.1.3 Define the getBinaryContent function
The getBinaryContent function is used to initiate a range request based on the passed parameters, thereby downloading file data blocks within the specified range:

function getBinaryContent(url, start, end, i) {
  return new Promise((resolve, reject) => {
    try {
      let xhr = new XMLHttpRequest();
      xhr.open("GET", url, true);
      xhr.setRequestHeader("range", `bytes=${start}-${end}`); // Set the range request information on the request header xhr.responseType = "arraybuffer"; // Set the return type to arraybuffer
      xhr.onload = function () {
        resolve({
          index: i, // index of file block buffer: xhr.response, // data corresponding to range request });
      };
      xhr.send();
    } catch (err) {
      reject(new Error(err));
    }
  });
}

Note that ArrayBuffer objects are used to represent generic, fixed-length raw binary data buffers. We cannot directly manipulate the contents of ArrayBuffer, but rather through typed array objects or DataView objects, which represent the data in the buffer in a specific format and use these formats to read and write the contents of the buffer.

2.1.4 Define the concatenate function Since we cannot directly operate on ArrayBuffer objects, we need to convert the ArrayBuffer objects into Uint8Array objects first, and then perform the concatenation operation. The concatenate function defined below is used to merge the downloaded file data blocks. The specific code is as follows:

function concatenate(arrays) {
  if (!arrays.length) return null;
  let totalLength = arrays.reduce((acc, value) => acc + value.length, 0);
  let result = new Uint8Array(totalLength);
  let length = 0;
  for (let array of arrays) {
    result.set(array, length);
    length += array.length;
  }
  return result;
}

2.1.5 Defining the saveAs function
The saveAs function is used to implement the client file saving function. Here is just a simple implementation. In actual projects, you may consider using FileSaver.js directly.

function saveAs({ name, buffers, mime = "application/octet-stream" }) {
  const blob = new Blob([buffers], { type: mime });
  const blobUrl = URL.createObjectURL(blob);
  const a = document.createElement("a");
  a.download = name || Math.random();
  a.href = blobUrl;
  a.click();
  URL.revokeObjectURL(blob);
}

In the saveAs function, we used Blob and Object URL. Object URL is a pseudo-protocol that allows Blob and File objects to be used as URL sources for images, downloadable binary data links, etc. In the browser, we use the URL.createObjectURL method to create an Object URL. This method receives a Blob object and creates a unique URL for it in the form of blob:<origin>/<uuid>. The corresponding example is as follows:

blob:https://example.org/40a5fb5a-d56d-4a33-b4e2-0acf6a8e5f641

The browser internally stores a URL → Blob mapping for each URL generated via URL.createObjectURL. Therefore, such URLs are shorter but can access the blob. The generated URL is only valid while the current document is open.

OK, that’s all about Object URLs.

2.1.6 Define the download function

The download function is used to implement the download operation, it supports 3 parameters:

  • url (string type): the address of the pre-downloaded resource;
  • chunkSize (number type): the size of the chunk, in bytes;
  • poolLimit (number type): indicates the number of concurrent requests.
async function download({ url, chunkSize, poolLimit = 1 }) {
  const contentLength = await getContentLength(url);
  const chunks = typeof chunkSize === "number" ? Math.ceil(contentLength / chunkSize) : 1;
  const results = await asyncPool(
    poolLimit,
    [...new Array(chunks).keys()],
    (i) => {
      let start = i * chunkSize;
      let end = i + 1 == chunks ? contentLength - 1 : (i + 1) * chunkSize - 1;
      return getBinaryContent(url, start, end, i);
    }
  );
  const sortedBuffers = results
    .map((item) => new Uint8Array(item.buffer));
  return concatenate(sortedBuffers);
}

2.2 Example of using large file download

Based on the auxiliary function defined above, we can easily implement parallel downloading of large files. The specific code is as follows:

function multiThreadedDownload() {
  const url = document.querySelector("#fileUrl").value;
  if (!url || !/https?/.test(url)) return;
  console.log("Multi-thread download started: " + +new Date());
  download({
    url,
    chunkSize: 0.1 * 1024 * 1024,
    poolLimit: 6,
  }).then((buffers) => {
    console.log("Multi-thread download ends: " + +new Date());
    saveAs({ buffers, name: "My compressed package", mime: "application/zip" });
  });
}

Since the complete example code is quite long, I will not post the specific code. If you are interested, you can visit the following address to browse the sample code.

Full example code: https://gist.github.com/semlinker/837211c039e6311e1e7629e5ee5f0a42

Here we take a look at the running results of the large file download example:

Conclusion

This article introduces how to use the asyncPool function provided by the async-pool library in JavaScript to achieve parallel downloading of large files. In addition to introducing the asyncPool function, Abaoge also introduced related knowledge such as how to obtain the file size through the HEAD request, how to initiate HTTP range requests, and how to save files on the client. In fact, the asyncPool function can not only realize the parallel downloading of large files, but also the parallel upload of large files. Interested friends can try it by themselves.

The above is the details of how to implement parallel downloading of large files in JavaScript. For more information about parallel downloading of large files in JavaScript, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • js realizes multiple countdowns in parallel js group countdown
  • Detailed explanation of how to debug Node.js and JavaScript directly using Chrome DevTools (in parallel)
  • headjs implements parallel loading of websites but sequential execution of JS
  • Deep understanding of parallel processing in JavaScript
  • Javascript parallel computing implementation code
  • Example of front-end javascript to implement file download
  • Javascript front-end download file stream code example from background
  • Analysis of JavaScript's method for downloading multiple files
  • Native js implements example methods such as file upload, download, and packaging
  • JavaScript code example for downloading and renaming files
  • Vue implements online preview of PDF files and downloading (pdf.js)
  • Use JavaScript to create and download files (simulate clicks)
  • Detailed example of implementing file upload and download functions with node.js express framework

<<:  MySQL 5.7.21 winx64 green version installation and configuration method graphic tutorial

>>:  Example of nginx ip blacklist dynamic ban

Recommend

Solve the problem of black screen when starting VMware virtual machine

# Adjust VMware hard disk boot priority Step 1: E...

Mac node deletion and reinstallation case study

Mac node delete and reinstall delete node -v sudo...

Solution to the failure of loading dynamic library when Linux program is running

Unable to load dynamic library under Linux When t...

A brief discussion on Python's function knowledge

Table of contents Two major categories of functio...

MySQL 8.0.12 Simple Installation Tutorial

This article shares the installation tutorial of ...

One question to understand multiple parameters of sort command in Linux

The sort command is very commonly used, but it al...

A complete guide to some uncommon but useful CSS attribute operations

1. Custom text selection ::selection { background...

Prometheus monitors MySQL using grafana display

Table of contents Prometheus monitors MySQL throu...

Specific use of MySQL segmentation function substring()

There are four main MySQL string interception fun...

CSS flexible layout FLEX, media query and mobile click event implementation

flex layout Definition: The element of Flex layou...

Summary of WEBAPP development skills (notes for mobile website development)

1. To develop web responsively, the page must ada...

mysql installer web community 5.7.21.0.msi installation graphic tutorial

This article example shares the specific code for...

VMware WorkStation 14 pro installation Ubuntu 17.04 tutorial

This article records the specific method of insta...

Interviewers often ask questions about React's life cycle

React Lifecycle Two pictures to help you understa...