gbk utf8 How to choose to correctly understand and use GBK and UTF-8 web page encoding

gbk utf8 How to choose to correctly understand and use GBK and UTF-8 web page encoding
Web page encoding is translated into English as web page encoding, which is a library that specifies the specific character encoding format in a web page.
GBK is a standard that is expanded based on the national standard GB2312 and is compatible with GB2312. GBK text encoding is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese, the highest bit is set to 1. GBK contains all Chinese characters and is a national code. It is less universal than UTF8, but the database occupied by UTF8 is larger than GBK.

UTF-8: Unicode TransformationFormat-8bit, which allows BOM but usually does not contain BOM. It is a multi-byte encoding used to resolve international characters. It uses 8 bits (i.e. one byte) for English and 24 bits (three bytes) for Chinese. UTF-8 contains characters needed by all countries in the world. It is an international encoding with strong versatility. UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set. If it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download the Chinese language support package for IE.
Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version. Therefore, it is not recommended and is only for users who have special requirements for international compatibility. Simply put: for websites with more Chinese characters, it is appropriate to use GBK encoding to save database space. For websites with more English content, it is appropriate to use UTF-8 to save database space.

How to convert GBK, GB2312, etc. to UTF8? GBK, GB2312 and UTF8 must be converted to each other through Unicode encoding: GBK, GB2312-Unicode-UTF8; UTF8-Unicode-GBK, GB2312. Using "Save As" in Windows Notepad, you can convert between GBK, Unicode, Unicode big endian and UTF-8 encoding methods.

How to make the browser correctly identify the web page encoding? Generally, there should be a sentence like this in a web page: <meta http-equiv="Content-Type" content="text/html; charset=gb2312">, indicating that the character set encoding of this web page is GB2312. (or UTF-8)
Why does garbled characters appear when the page sometimes specifies the encoding? This may be because the page's declared encoding is inconsistent with the file's own encoding. More often, it is caused by opening the page with the wrong encoding and then saving it, or by using some FTP software to directly modify the file online, such as CuteFTP, which converts to the wrong encoding due to incorrect software encoding configuration. At this time, use Window's Notepad to open it, and use "Save As" to save it as the corresponding encoding to solve the problem.

When using IE as a browser on a Windows operating system, the following problem often occurs: when browsing a web page that uses UTF-8 encoding, the browser cannot automatically identify the encoding used by the page, even if the web page has declared the encoding format: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />. As a result, some pages containing Chinese UTF-8 encoding produce blank output. This will not cause this problem if you are using Firefox or Sarafi browser. This is because IE prioritizes HTML tags when parsing web page codes, and then the information in HTTP header, while the Mozilla series of browsers are just the opposite.

Because UTF-8 uses 3 bytes to represent a Chinese character, while ordinary GB2312 or BIG5 uses two. When the page is output, due to the above reasons, when the browser parses and outputs the content of <title></title>, if there is an odd number of full-width characters before </title>, IE will parse UTF-8 as two bytes and half of a Chinese character will appear. At this time, the half Chinese character will be combined with the < of </title> to form a garbled word, causing IE to be unable to read the <title> part, making the entire page output empty. At this time, if you check the source file, you will find that the entire page has actually been output, but the browser does not display the content. The simplest solution is to put <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> before <title></title>.

<<:  How to use @media in mobile adaptive styles

>>:  React handwriting tab switching problem

Recommend

JS realizes the effect of Baidu News navigation bar

This article shares the specific code of JS to ac...

Solution to many line breaks and carriage returns in MySQL data

Table of contents Find the problem 1. How to remo...

Detailed explanation of common Docker commands

1. Help Command 1. View the current Docker versio...

Multiple ways to change the SELECT options in an HTML drop-down box

After the form is submitted, the returned HTML pag...

Detailed steps for quick installation of openshift

The fastest way to experience the latest version ...

MySQL data insertion efficiency comparison

When inserting data, I found that I had never con...

Detailed explanation of client configuration for vue3+electron12+dll development

Table of contents Modify the repository source st...

Markup validation for doctype

But recently I found that using this method will c...

Vue implements star rating with decimal points

This article shares the specific code of Vue to i...

How to change the Ali source in Ubuntu 20.04

Note that this article does not simply teach you ...

Solution to ElementUI's this.$notify.close() call not working

Table of contents Requirement Description Problem...

Why is UTF-8 not recommended in MySQL?

I recently encountered a bug where I was trying t...

Determine whether MySQL update will lock the table through examples

Two cases: 1. With index 2. Without index Prerequ...

Detailed tutorial on Tomcat installation and deployment in Windows 10

Table of contents 1 Java environment configuratio...