Conventional solutionUse FileReader to read the file in UTF-8 format, and determine whether the file is UTF-8 based on whether the file content contains garbled characters. If � exists, the file encoding is not utf-8, otherwise it is utf-8. The code is as follows: const isUtf8 = async (file: File) => { return await new Promise((resolve, reject) => { const reader = new FileReader(); reader.readAsText(file); reader.onloadend = (e: any): void => { const content = e.target.result; const encodingRight = content.indexOf("") === -1; if (encodingRight) { resolve(encodingRight); } else { reject(new Error("Encoding format error, please upload UTF-8 format file")); } }; reader.onerror = () => { reject(new Error("File content reading failed, please check if the file is damaged")); }; }); }; The problem with this method is that if the file is very large, such as several GB, the content read by the browser is directly placed in the memory, and the fileReader instance will directly trigger onerror and throw an error, and sometimes the browser will directly crash. Large file solutionFor large files, you can sample the file content and slice the file. Here, 100 slices are used. For each file cut out, cut out the first 1kb segment and read it in string mode. If 1024B is cut right in the middle of a Chinese character encoding, it may cause an error when reading it as a string, that is, � may appear at the beginning and end, and it is considered to be a non-utf-8 segment. At this time, you can take the first half of the string corresponding to 1kb and then determine whether it exists. The above constants can be adjusted according to requirements. The code is as follows: const getSamples = (file: File) => { const filesize = file.size; const parts: Blob[] = []; if (filesize < 50 * 1024 * 1024) { parts.push(file); } else { let total = 100; const sampleSize = 1024 * 1024; const chunkSize = Math.floor(filesize / total); let start = 0; let end = sampleSize; while (total > 1) { parts.push(file.slice(start, end)); start += chunkSize; end += chunkSize; total--; } } return parts; }; const isUtf8 = (filePart: Blob) => { return new Promise((resolve, reject) => { const fileReader = new FileReader(); fileReader.readAsText(filePart); fileReader.onload = (e) => { const str = e.target?.result as string; // Take roughly half const sampleStr = str?.slice(4, 4 + str?.length / 2); if (sampleStr.indexOf("�") === -1) { resolve(void 0); } else { reject(new Error(Encoding format error, please upload UTF-8 format file")); } }; fileReader.onerror = () => { reject(new Error(File content reading failed, please check if the file is damaged")); }; }); }; export default async function (file: File) { const samples = getSamples(file); let res = true; for (const filePart of samples) { try { await isUtf8(filePart); } catch (error) { res = false; break; } } return res; } This is the end of this article about how js determines whether a file is encoded in utf-8. For more relevant js judgment utf-8 content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: mysql5.6.zip format compressed version installation graphic tutorial
MySQL download and installation (version 8.0.20) ...
1. How to use the link: Copy code The code is as f...
1. Problem Description Today I need to check the ...
The biggest bottleneck of using zabbix is the d...
Recently, I need to use a lot of fragmented pictu...
This article example shares the specific code for...
When installing a virtual machine, a prompt appea...
Preface: I encountered a requirement to extract s...
Imagine a scenario where, when designing a user t...
Table of contents The browser's rendering mec...
Preface Starting from React 16, the concept of Er...
1. Solution to the problem that the page is blank...
remember: IDE disk: the first disk is hda, the se...
Table of contents Introduction Install 1. Create ...