Unicode signature BOM detailed description

Unicode signature BOM detailed description
Unicode Signature BOM - What is the BOM?
BOM is the abbreviation of Byte Order Mark. It is the standard mark used to identify the encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character.

Microsoft adds three bytes EF BB BF before its own UTF-8 format text files. Programs such as Notepad on Windows determine whether a text file is ASCII or UTF-8 based on these three bytes. However, this is just a mark made by Microsoft secretly. There is no such mark for UTF-8 text files on other platforms.

Unicode signature BOM - How to view UTF-8

That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them?
Four methods.
1. Open the file with UltraEdit-32 , switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header.
2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM".
3. Open the file with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI . If it is ANSI, it will not have BOM.

Unicode簽名bom Unicode Signature BOM

4: Open it with emeditor , select "Save As", and check whether Add Unicode Signature (bom) (G) under Encoding is checked. As shown in the figure:

Unicode Signature BOM - Problems and Solutions when Applying in PHP

Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EFBBBF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (that damn EFBBBF) with 20, save (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces.

Unicode signature bom-coding tips

I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes.

<<:  Detailed explanation of the top ten commonly used string functions in MySQL

>>:  Solve the margin: top collapse problem in CCS

Recommend

A brief discussion on the principle of Vue's two-way event binding v-model

Table of contents explain: Summarize Replenish Un...

Steps to deploy Docker project in IDEA

Now most projects have begun to be deployed on Do...

The difference between the four file extensions .html, .htm, .shtml and .shtm

Many friends who have just started to make web pag...

80 lines of code to write a Webpack plugin and publish it to npm

1. Introduction I have been studying the principl...

How to configure Bash environment variables in Linux

Shell is a program written in C language, which i...

Vue uses drag and drop to create a structure tree

This article example shares the specific code of ...

How to modify the default encoding of mysql in Linux

During the development process, if garbled charac...

Analysis of MySQL duplicate index and redundant index examples

This article uses examples to describe MySQL dupl...

Implementation of drawing audio waveform with wavesurfer.js

1. View the renderings Select forward: Select bac...

Implementing Markdown rendering in Vue single-page application

When rendering Markdown before, I used the previe...

MySQL 8.0.23 installation and configuration method graphic tutorial under win10

This article shares the installation and configur...

Advantages and disadvantages of conditional comments in IE

IE's conditional comments are a proprietary (...