gbk utf8 How to choose to correctly understand and use GBK and UTF-8 web page encoding

gbk utf8 How to choose to correctly understand and use GBK and UTF-8 web page encoding
Web page encoding is translated into English as web page encoding, which is a library that specifies the specific character encoding format in a web page.
GBK is a standard that is expanded based on the national standard GB2312 and is compatible with GB2312. GBK text encoding is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese, the highest bit is set to 1. GBK contains all Chinese characters and is a national code. It is less universal than UTF8, but the database occupied by UTF8 is larger than GBK.

UTF-8: Unicode TransformationFormat-8bit, which allows BOM but usually does not contain BOM. It is a multi-byte encoding used to resolve international characters. It uses 8 bits (i.e. one byte) for English and 24 bits (three bytes) for Chinese. UTF-8 contains characters needed by all countries in the world. It is an international encoding with strong versatility. UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set. If it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download the Chinese language support package for IE.
Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version. Therefore, it is not recommended and is only for users who have special requirements for international compatibility. Simply put: for websites with more Chinese characters, it is appropriate to use GBK encoding to save database space. For websites with more English content, it is appropriate to use UTF-8 to save database space.

How to convert GBK, GB2312, etc. to UTF8? GBK, GB2312 and UTF8 must be converted to each other through Unicode encoding: GBK, GB2312-Unicode-UTF8; UTF8-Unicode-GBK, GB2312. Using "Save As" in Windows Notepad, you can convert between GBK, Unicode, Unicode big endian and UTF-8 encoding methods.

How to make the browser correctly identify the web page encoding? Generally, there should be a sentence like this in a web page: <meta http-equiv="Content-Type" content="text/html; charset=gb2312">, indicating that the character set encoding of this web page is GB2312. (or UTF-8)
Why does garbled characters appear when the page sometimes specifies the encoding? This may be because the page's declared encoding is inconsistent with the file's own encoding. More often, it is caused by opening the page with the wrong encoding and then saving it, or by using some FTP software to directly modify the file online, such as CuteFTP, which converts to the wrong encoding due to incorrect software encoding configuration. At this time, use Window's Notepad to open it, and use "Save As" to save it as the corresponding encoding to solve the problem.

When using IE as a browser on a Windows operating system, the following problem often occurs: when browsing a web page that uses UTF-8 encoding, the browser cannot automatically identify the encoding used by the page, even if the web page has declared the encoding format: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />. As a result, some pages containing Chinese UTF-8 encoding produce blank output. This will not cause this problem if you are using Firefox or Sarafi browser. This is because IE prioritizes HTML tags when parsing web page codes, and then the information in HTTP header, while the Mozilla series of browsers are just the opposite.

Because UTF-8 uses 3 bytes to represent a Chinese character, while ordinary GB2312 or BIG5 uses two. When the page is output, due to the above reasons, when the browser parses and outputs the content of <title></title>, if there is an odd number of full-width characters before </title>, IE will parse UTF-8 as two bytes and half of a Chinese character will appear. At this time, the half Chinese character will be combined with the < of </title> to form a garbled word, causing IE to be unable to read the <title> part, making the entire page output empty. At this time, if you check the source file, you will find that the entire page has actually been output, but the browser does not display the content. The simplest solution is to put <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> before <title></title>.

<<:  How to use @media in mobile adaptive styles

>>:  React handwriting tab switching problem

Recommend

Detailed explanation of keepAlive usage in Vue front-end development

Table of contents Preface keep-avlive hook functi...

Detailed explanation of the spacing problem between img tags

IMG tag basic analysis In HTML5, the img tag has ...

Avoid abusing this to read data in data in Vue

Table of contents Preface 1. The process of using...

Detailed explanation of HTML document types

Mine is: <!DOCTYPE html> Blog Garden: <!...

Detailed explanation of Vue monitoring attribute graphic example

Table of contents What is the listener property? ...

MySQL login and exit command format

The command format for mysql login is: mysql -h [...

Simple comparison of meta tags in html

The meta tag is used to define file information an...

MySQL detailed single table add, delete, modify and query CRUD statements

MySQL add, delete, modify and query statements 1....

Installation and deployment of MySQL Router

Table of contents 01 Introduction to MySQL Router...

Tutorial on importing and exporting Docker containers

background The popularity of Docker is closely re...

Talking about Less and More in Web Design (Picture)

Less is More is a catchphrase for many designers....

How to use VUE to call Ali Iconfont library online

Preface Many years ago, I was a newbie on the ser...

Detailed example of jQuery's chain programming style

The implementation principle of chain programming...