How to use js to determine whether a file is utf-8 encoded

Conventional solution

Use FileReader to read the file in UTF-8 format, and determine whether the file is UTF-8 based on whether the file content contains garbled characters.

If � exists, the file encoding is not utf-8, otherwise it is utf-8.

The code is as follows:

const isUtf8 = async (file: File) => {
  return await new Promise((resolve, reject) => {
    const reader = new FileReader();
    reader.readAsText(file);

    reader.onloadend = (e: any): void => {
      const content = e.target.result;
      const encodingRight = content.indexOf("") === -1;

      if (encodingRight) {
        resolve(encodingRight);
      } else {
        reject(new Error("Encoding format error, please upload UTF-8 format file"));
      }
    };
    
    reader.onerror = () => {
      reject(new Error("File content reading failed, please check if the file is damaged"));
    };
  });
};

The problem with this method is that if the file is very large, such as several GB, the content read by the browser is directly placed in the memory, and the fileReader instance will directly trigger onerror and throw an error, and sometimes the browser will directly crash.

Large file solution

For large files, you can sample the file content and slice the file. Here, 100 slices are used. For each file cut out, cut out the first 1kb segment and read it in string mode. If 1024B is cut right in the middle of a Chinese character encoding, it may cause an error when reading it as a string, that is, � may appear at the beginning and end, and it is considered to be a non-utf-8 segment. At this time, you can take the first half of the string corresponding to 1kb and then determine whether it exists.

The above constants can be adjusted according to requirements.

The code is as follows:

const getSamples = (file: File) => {
  const filesize = file.size;
  const parts: Blob[] = [];
  if (filesize < 50 * 1024 * 1024) {
    parts.push(file);
  } else {
    let total = 100;
    const sampleSize = 1024 * 1024;
    const chunkSize = Math.floor(filesize / total);
    let start = 0;
    let end = sampleSize;
    while (total > 1) {
      parts.push(file.slice(start, end));
      start += chunkSize;
      end += chunkSize;
      total--;
    }
  }
  return parts;
};

const isUtf8 = (filePart: Blob) => {
  return new Promise((resolve, reject) => {
    const fileReader = new FileReader();

    fileReader.readAsText(filePart);

    fileReader.onload = (e) => {
      const str = e.target?.result as string;
      // Take roughly half const sampleStr = str?.slice(4, 4 + str?.length / 2);
      if (sampleStr.indexOf("�") === -1) {
        resolve(void 0);
      } else {
        reject(new Error(Encoding format error, please upload UTF-8 format file"));
      }
    };

    fileReader.onerror = () => {
      reject(new Error(File content reading failed, please check if the file is damaged"));
    };
  });
};

export default async function (file: File) {
  const samples = getSamples(file);
  let res = true;

  for (const filePart of samples) {
    try {
      await isUtf8(filePart);
    } catch (error) {
      res = false;
      break;
    }
  }
  return res;
}

This is the end of this article about how js determines whether a file is encoded in utf-8. For more relevant js judgment utf-8 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:

PHP determines whether the string encoding is utf-8 or gb2312 example
PHP regular expression to judge Chinese UTF-8 or GBK and its specific implementation

<<: mysql5.6.zip format compressed version installation graphic tutorial

>>: A troubleshooting experience of centos Docker bridge mode unable to access the host Redis service

MySQL 8.0.15 installation graphic tutorial and database basics

How to use js to determine whether a file is utf-8 encoded

Conventional solution

Large file solution

MySQL 8.0.15 installation graphic tutorial and database basics

Detailed explanation of the basic use of react-navigation6.x routing library

Win10 installation of MySQL5.7.18winX64 failed to start the server and no error message

Vue3 + TypeScript Development Summary

Detailed explanation of Vue development Sort component code

A complete list of meta tag settings for mobile devices

A brief discussion on MySQL event planning tasks

How to use skeleton screen in vue project

Explanation of Dockerfile instructions and basic structure

Use nginx to dynamically convert image sizes to generate thumbnails

Recommend

WeChat applet implements waterfall flow paging scrolling loading

jQuery plugin to implement accordion secondary menu

A brief discussion on the preliminary practice of Docker container interconnection

Two special values in CSS are used to control the inherit and initial methods of the cascade

Detailed explanation of the loop form item example in Vue

Database backup in docker environment (postgresql, mysql) example code

Ubuntu 15.04 opens mysql remote port 3306

Detailed explanation of Docker cross-host container communication overlay implementation process

VUE implements bottom suction button

Simple tutorial on using Navicat For MySQL

Comprehensive understanding of html.css overflow

Solution to the problem of child element margin-top causing parent element to move

HTML page adaptive width table

JavaScript to achieve JD.com flash sale effect

Sample code for the test script for indexes and locks at RR and RC isolation levels