Unicode signature BOM detailed description

Unicode signature BOM detailed description
Unicode Signature BOM - What is the BOM?
BOM is the abbreviation of Byte Order Mark. It is the standard mark used to identify the encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character.

Microsoft adds three bytes EF BB BF before its own UTF-8 format text files. Programs such as Notepad on Windows determine whether a text file is ASCII or UTF-8 based on these three bytes. However, this is just a mark made by Microsoft secretly. There is no such mark for UTF-8 text files on other platforms.

Unicode signature BOM - How to view UTF-8

That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them?
Four methods.
1. Open the file with UltraEdit-32 , switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header.
2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM".
3. Open the file with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI . If it is ANSI, it will not have BOM.

Unicode簽名bom Unicode Signature BOM

4: Open it with emeditor , select "Save As", and check whether Add Unicode Signature (bom) (G) under Encoding is checked. As shown in the figure:

Unicode Signature BOM - Problems and Solutions when Applying in PHP

Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EFBBBF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (that damn EFBBBF) with 20, save (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces.

Unicode signature bom-coding tips

I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes.

<<:  Detailed explanation of the top ten commonly used string functions in MySQL

>>:  Solve the margin: top collapse problem in CCS

Recommend

MySQL slow query log configuration and usage tutorial

Preface MySQL slow query log is a function that w...

A must-read career plan for web design practitioners

Original article, please indicate the author and ...

Web project development JS function anti-shake and throttling sample code

Table of contents Stabilization Introduction Anti...

HTML form component example code

HTML forms are used to collect different types of...

MySQL index usage monitoring skills (worth collecting!)

Overview In a relational database, an index is a ...

Axios cancels repeated requests

Table of contents Preface 1. How to cancel a requ...

JavaScript implements the detailed process of stack structure

Table of contents 1. Understanding the stack stru...

Linux remote control windows system program (three methods)

Sometimes we need to remotely run programs on the...

Steps to completely uninstall the docker image

1. docker ps -a view the running image process [r...

Solution to the problem of slow docker pull image speed

Currently, Docker has an official mirror for Chin...

jQuery implements form validation

Use jQuery to implement form validation, for your...

A brief analysis of different ways to configure static IP addresses in RHEL8

While working on a Linux server, assigning static...

Docker time zone issue and data migration issue

Latest solution: -v /usr/share/zoneinfo/Asia/Shan...

Linux Disk Quota Management Graphical Example

Disk quota is the storage limit of a specified di...