BOM (Byte Order Mark) is a standard mark used to identify encoding in the UTF encoding scheme. In UTF-16, it was originally FF FE, and in UTF-8 it becomes EF BB BF. This flag is optional, and since UTF8 bytes have no order, it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this detection, but some software does not and treats it as a normal character. Microsoft adds three bytes EF BB BF before its own UTF-8 text files. Programs such as Notepad on Windows use these three bytes to determine whether a text file is ASCII or UTF-8. However, this is just a mark made by Microsoft secretly. Other platforms do not have such a mark for UTF-8 text files. That is to say, a UTF-8 file may have a BOM or may not have a BOM, so how to distinguish them? Three methods. 1. Open the file with UltraEdit-32, switch to hexadecimal editing mode, and check whether there is EF BB BF in the file header. 2. Open it with Dreamweaver, check the page properties, and see if there is a check mark in front of "Include Unicode Signature BOM". 3. Open it with Windows Notepad, select "Save As", and check whether the default encoding of the file is UTF-8 or ANSI. If it is ANSI, it will not have BOM. I found html_header.php in the Zen Cart template file and discovered that the file did not have a BOM. I saved it with UltraEdit-32, added the BOM, and then uploaded html_header.php. Everything was normal. Note that when using Convertz to convert a gb2312 file to a UTF-8 file, the default setting is to not include BOM. The above garbled characters may appear without BOM. However, if BOM is included, you should be careful with PHP include files, as EF BB BF will be added in front of the PHP byte stream. Outputting it to the display in advance may cause program errors. One solution is to save all included files as ANSI, and the main file can be UTF-8. To remove the BOM from a file, open it with UlterEdit, switch to hexadecimal editing mode, replace the first three bytes (the damn EF BB BF) with 20, save the file (note to turn off the automatic backup function when saving), then switch to the default editing mode and remove the first three spaces. I also learned some little knowledge about encoding: the so-called unicode saved files are actually utf-16, which just happens to be the same as the unicode code, but conceptually unicode and utf are two different things. unicode is a memory encoding representation scheme, and utf is a scheme for how to save and transmit unicode. UTF-16 is divided into two types: high byte first (LE) and high byte last (BE). The official utf encoding also includes utf-32, which is also divided into LE and BE. The non-unicode official utf encoding also includes utf-7, which is mainly used for email transmission. The single-byte part of utf-8 is compatible with iso-8859-1. This is mainly because some old systems and library functions cannot handle utf-16 correctly and are forced out. For English characters, it also saves saved file space (at the expense of wasting space for non-English characters). When using iso-8859-1, both utf8 and iso-8859-1 are represented by one byte. When representing other characters, utf-8 uses two or three bytes. |
<<: Summary of Mysql-connector-java driver version issues
>>: DIV common attributes collection
1. Overview This article systematically explains ...
Download the official website Choose the version ...
An application of CSS animation, with the same co...
This article uses examples to illustrate the func...
This article shares with you a book flipping effe...
Table of contents JSX environment construction In...
1.docker search mysql查看mysql版本 2. docker pull mys...
1. HTML Image <img> 1. The <img> tag ...
Introduction This article records how to mount a ...
Click the button to turn the text into an input b...
1. Set CORS response header to achieve cross-doma...
Table of contents Vue routing relative path jump ...
Achieve results Implementation Code html <div ...
1. Record several methods of centering the box: 1...
Preface Recently, my computer often takes a long ...