Unicode signature BOM (Byte Order Mark) issue for UTF-8 files

I recently encountered a strange thing when debugging a Chinese Zen Cart website with UTF8 encoding. The text on the webpage was displayed normally, but when I used IE to view the source file (opened it with Notepad), I found garbled characters. Firefox did not have this problem. After much online verification and testing, the problem was solved. It was actually a problem with the Unicode signature BOM (Byte Order Mark) of the UTF-8 file.

BOM (Byte Order Mark) is a standard mark used to identify encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character.

Microsoft adds three bytes EF BB BF before its own UTF-8 text files. Programs such as Notepad on Windows use these three bytes to determine whether a text file is ASCII or UTF-8. However, this is just a mark made by Microsoft secretly. Other platforms do not have such a mark for UTF-8 text files.

That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them? Three methods. 1. Open the file with UltraEdit-32, switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header. 2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM". 3. Open it with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI. If it is ANSI, it will not have BOM.

I found html_header.php in the Zen Cart template file and discovered that the file did not have a BOM. I saved it with UltraEdit-32, added the BOM, and then uploaded html_header.php. Everything was normal.

Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EF BB BF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (the damn EF BB BF) with 20, save the file (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces.

I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes.

<<: Summary of Mysql-connector-java driver version issues

>>: DIV common attributes collection

MySQL 5.7.17 installation and configuration method graphic tutorial (windows)

Unicode signature BOM (Byte Order Mark) issue for UTF-8 files

MySQL 5.7.17 installation and configuration method graphic tutorial (windows)

Sample code for achieving three-dimensional picture placement effect with pure CSS

Detailed example of database operation object model in Spring jdbc

Steps to use autoconf to generate Makefile and compile the project

Steps to build a Docker image using Dockerfile

MySQL Daemon failed to start error solution

Notes on upgrading to mysql-connector-java8.0.27

My CSS architecture concept - it varies from person to person, there is no best, only suitable

Solution to span width not being determined in Firefox or IE

HTML fixed title column, title header table specific implementation code

Recommend

Detailed graphic tutorial on how to enable remote secure access with Docker

Solution for forgetting the root password of MySQL5.7 under Windows 8.1

Several ways to solve the 1px border problem on mobile devices (5 methods)

Detailed explanation of MySQL index selection and optimization

Docker installs the official Redis image and enables password authentication

JS Object constructor Object.freeze

DELL R730 server configuration RAID and installation server system and domain control detailed graphic tutorial

Learn Node.js from scratch

Linux virtual memory settings tutorial and practice

Detailed explanation of SELINUX working principle

Examples of preview functions for various types of files in vue3

Detailed explanation of installing jdk1.8 and configuring environment variables in a Linux-like environment

CSS3 realizes the graphic falling animation effect

A brief understanding of the relevant locks in MySQL

How to solve the phantom read problem in MySQL