Unicode signature BOM detailed description

Unicode signature BOM detailed description
Unicode Signature BOM - What is the BOM?
BOM is the abbreviation of Byte Order Mark. It is the standard mark used to identify the encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character.

Microsoft adds three bytes EF BB BF before its own UTF-8 format text files. Programs such as Notepad on Windows determine whether a text file is ASCII or UTF-8 based on these three bytes. However, this is just a mark made by Microsoft secretly. There is no such mark for UTF-8 text files on other platforms.

Unicode signature BOM - How to view UTF-8

That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them?
Four methods.
1. Open the file with UltraEdit-32 , switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header.
2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM".
3. Open the file with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI . If it is ANSI, it will not have BOM.

Unicode簽名bom Unicode Signature BOM

4: Open it with emeditor , select "Save As", and check whether Add Unicode Signature (bom) (G) under Encoding is checked. As shown in the figure:

Unicode Signature BOM - Problems and Solutions when Applying in PHP

Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EFBBBF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (that damn EFBBBF) with 20, save (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces.

Unicode signature bom-coding tips

I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes.

<<:  Detailed explanation of the top ten commonly used string functions in MySQL

>>:  Solve the margin: top collapse problem in CCS

Recommend

mysql 8.0.19 win10 quick installation tutorial

This tutorial shares the installation tutorial of...

Zabbix monitors the process of Linux system services

Zabbix automatically discovers rules to monitor s...

Analysis of the process of deploying Python applications in Docker containers

Simple application deployment 1. Directory struct...

A brief discussion on the font settings in web pages

Setting the font for the entire site has always b...

About vue component switching, dynamic components, component caching

Table of contents 1. Component switching method M...

How to achieve the maximum number of connections in mysql

Table of contents What is the reason for the sudd...

How to install Nginx and configure multiple domain names

Nginx Installation CentOS 6.x yum does not have n...

How to use docker to deploy spring boot and connect to skywalking

Table of contents 1. Overview 1. Introduction to ...

Solve the problem of docker images disappearing

1. Mirror images disappear in 50 and 93 [root@h50...

Tutorial on installing MySQL on Alibaba Cloud Centos 7.5

It seems that the mysql-sever file for installing...

Write a React-like framework from scratch

Recently I saw the article Build your own React o...

js uses the reduce method to make your code more elegant

Preface In actual projects, the most common proce...