Why web page encoding uses utf-8 instead of gbk or gb2312?

Why web page encoding uses utf-8 instead of gbk or gb2312?

If you have a choice, you should use UTF-8

In fact, Windows system's own programs have already fully switched to Unicode, and GBK is just a stopgap measure to cope with Chinese standards.

GBK's text encoding is expressed in double bytes, that is, both Chinese and English characters are expressed in double bytes, but in order to distinguish Chinese, the highest bit is set to 1.

As for UTF-8 encoding, it is a multi-byte encoding used to resolve international characters. It uses 8 bits (one byte) for English and 24 bits (three bytes) for Chinese. For forums with more English characters, UTF-8 is used to save space.

GBK contains all Chinese characters.

UTF-8 contains characters needed by all countries in the world.

GBK is a standard that is compatible with GB2312 after expansion based on the national standard GB2312 (it seems that it is not yet a national standard)

UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set.
For example, if it is UTF8 encoding, Chinese can be displayed on foreigners' English IE without them having to download the Chinese language support package for IE.

Therefore, for forums with more English, using GBK will take up 2 bytes for each character, while using UTF-8 will only take up one byte.

Please note: Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version. Therefore, it is not recommended and is only for users who have special requirements for international compatibility.

In simple terms:
For forums with a lot of Chinese text, it is appropriate to use GBK encoding to save database space.
For forums with more English content, it is appropriate to use UTF-8 to save database space.

What are the differences between gbk and gb2312

First of all, everyone needs to understand what is GBK? What is GB2312? We need to know that they are all a kind of character encoding, of course there are many kinds of character encoding.

We can understand character encoding as follows:

Computers store binary values ​​of 0 and 1.

8 bits correspond to one byte, which is usually expressed in hexadecimal.

So how can we achieve this if we want to see the characters we want displayed on the computer instead of various numbers 0 and 1?

Here we need to make the computer convert the corresponding hexadecimal values ​​it stores into corresponding characters, including characters in other languages ​​such as English and Chinese, and then output them to the screen.

So encoding means defining a set of rules to specify which values ​​correspond to which characters.

Then character encoding defines a set of rules that specify which value among the many values ​​stored in the computer corresponds to which letter displayed on the computer screen.

To sum up, everyone should understand that GBK and GB2312 are a kind of character encoding.

Let's talk about their differences and similarities in detail below:

Similarities:

1. GBK and GB2312 are both 16 bits!

2. They are usually used in the meta tags of web pages.

Differences:

1. GBK character encoding supports Simplified Chinese and Traditional Chinese!

GBK stands for "Chinese Internal Code Extension Specification" (GBK means the first letter of "national standard" and "extension" of Chinese pinyin, and its English name is Chinese Internal Code Specification). It was formulated by the National Technical Committee of Information Technology Standardization of the People's Republic of China on December 1, 1995. The Standardization Department of the State Administration of Technical Supervision and the Science and Technology and Quality Supervision Department of the Ministry of Electronics Industry jointly identified it as a technical specification guiding document in the form of the document No. 229 of Technical Supervision Letter 1995 on December 15, 1995.

2. GB2312 only supports Simplified Chinese!

"Chinese Character Coded Character Set for Information Interchange" is a set of national standards issued by the General Administration of Standards of China in 1980 and implemented on May 1, 1981. The standard number is GB 2312-1980.
GB 2312 standard includes a total of 6763 Chinese characters, including 3755 first-level Chinese characters and 3008 second-level Chinese characters; at the same time, GB 2312 includes 682 full-width characters including Latin letters, Greek letters, Japanese Hiragana and Katakana letters, and Russian Cyrillic letters.

If your web pages are mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume is small and there are some advantages. If your web page is to be viewed by the world, and you use GB2312 and GBK as the web page encoding, some browsers on computers do not have this encoding, and the Chinese characters on your web page will become unrecognizable garbled characters.

<<:  Sharing some wonderful uses of wxs files in WeChat applet

>>:  How to convert extra text into ellipsis in HTML

Recommend

JavaScript uses canvas to draw coordinates and lines

This article shares the specific code of using ca...

Summary of Linux Logical Volume Management (LVM) usage

Managing disk space is an important daily task fo...

Detailed explanation of the flexible use of CSS grid system in projects

Preface CSS grids are usually bundled in various ...

How to disable foreign key constraint checking in MySQL child tables

Prepare: Define a teacher table and a student tab...

HTML hyperlink a tag_Powernode Java Academy

Anyone who has studied or used HTML should be fam...

CSS3 implements the sample code of NES game console

Achieve resultsImplementation Code html <input...

The meaning of the 5 types of spaces in HTML

HTML provides five space entities with different ...

How to implement web stress testing through Apache Bench

1. Introduction to Apache Bench ApacheBench is a ...

Solution to Apache cross-domain resource access error

In many cases, large and medium-sized websites wi...

Tutorial on using iostat command in Linux

Preface It is said that if the people doing opera...

Linux CentOS6.9 installation graphic tutorial under VMware

As a technical novice, I am recording the process...

MySQL 8.0.16 installation and configuration graphic tutorial under macOS

This article shares the installation and configur...

Linux ssh server configuration code example

Use the following terminal command to install the...

Compatibility with the inline-block property

<br />A year ago, there were no articles abo...

Detailed explanation of Vue's monitoring method case

Monitoring method in Vue watch Notice Name: You s...