Conventional solutionUse FileReader to read the file in UTF-8 format, and determine whether the file is UTF-8 based on whether the file content contains garbled characters. If � exists, the file encoding is not utf-8, otherwise it is utf-8. The code is as follows: const isUtf8 = async (file: File) => { return await new Promise((resolve, reject) => { const reader = new FileReader(); reader.readAsText(file); reader.onloadend = (e: any): void => { const content = e.target.result; const encodingRight = content.indexOf("") === -1; if (encodingRight) { resolve(encodingRight); } else { reject(new Error("Encoding format error, please upload UTF-8 format file")); } }; reader.onerror = () => { reject(new Error("File content reading failed, please check if the file is damaged")); }; }); }; The problem with this method is that if the file is very large, such as several GB, the content read by the browser is directly placed in the memory, and the fileReader instance will directly trigger onerror and throw an error, and sometimes the browser will directly crash. Large file solutionFor large files, you can sample the file content and slice the file. Here, 100 slices are used. For each file cut out, cut out the first 1kb segment and read it in string mode. If 1024B is cut right in the middle of a Chinese character encoding, it may cause an error when reading it as a string, that is, � may appear at the beginning and end, and it is considered to be a non-utf-8 segment. At this time, you can take the first half of the string corresponding to 1kb and then determine whether it exists. The above constants can be adjusted according to requirements. The code is as follows: const getSamples = (file: File) => { const filesize = file.size; const parts: Blob[] = []; if (filesize < 50 * 1024 * 1024) { parts.push(file); } else { let total = 100; const sampleSize = 1024 * 1024; const chunkSize = Math.floor(filesize / total); let start = 0; let end = sampleSize; while (total > 1) { parts.push(file.slice(start, end)); start += chunkSize; end += chunkSize; total--; } } return parts; }; const isUtf8 = (filePart: Blob) => { return new Promise((resolve, reject) => { const fileReader = new FileReader(); fileReader.readAsText(filePart); fileReader.onload = (e) => { const str = e.target?.result as string; // Take roughly half const sampleStr = str?.slice(4, 4 + str?.length / 2); if (sampleStr.indexOf("�") === -1) { resolve(void 0); } else { reject(new Error(Encoding format error, please upload UTF-8 format file")); } }; fileReader.onerror = () => { reject(new Error(File content reading failed, please check if the file is damaged")); }; }); }; export default async function (file: File) { const samples = getSamples(file); let res = true; for (const filePart of samples) { try { await isUtf8(filePart); } catch (error) { res = false; break; } } return res; } This is the end of this article about how js determines whether a file is encoded in utf-8. For more relevant js judgment utf-8 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: mysql5.6.zip format compressed version installation graphic tutorial
This article shares the specific code for WeChat ...
This article uses a jQuery plug-in to create an a...
1. Interconnection between Docker containers Dock...
There are two special values that can be assign...
Sometimes we may encounter such a requirement, th...
Table of contents posgresql backup/restore mysql ...
Ubuntu 15.04 opens MySQL remote port 3306. All th...
There are also two servers: Preparation: Set the ...
This article example shares the specific code of ...
recommend: Navicat for MySQL 15 Registration and ...
Comprehensive understanding of html.css overflow ...
Problem Description Today, when I was modifying t...
In the pages of WEB applications, tables are ofte...
This article shares the specific code of JavaScri...
Basic Concepts Current read and snapshot read In ...