First of all, we need to understand that GB2312, GBK and UTF-8 are all character encodings. In addition, there are many other character encodings. It’s just that for our Chinese websites, these three codes are more commonly used. To briefly explain why we need encoding, in computers, ASCII code is used to store text information, and each character corresponds to a unique ASCII code. Computers were originally invented in the United States, and they also used keyboards and the letters on them, so their ASCII characters were easy to solve. But it is different in China. Each Chinese character must correspond to a unique ASCII code. In this way, the character encoding standards formulated by the state came into being: GB2312, GBK, etc. Other countries and other languages also have their corresponding encoding standards. GB means national standard. GB2312 and GBK are mainly used for encoding Chinese characters, while UTF-8 is used worldwide. This means that if your web pages are mainly for Chinese people who use Chinese, it is very good to use GB2312 and GBK. The text storage volume is small and there are some advantages. If your web page is to be viewed by the world, and you use GB2312 and GBK as the web page encoding, some browsers on computers do not have this encoding, and the Chinese characters on your web page will become unrecognizable garbled characters. They are usually used in the meta tags of web pages, for example:, indicating that this page uses GB2312 encoding. This information is for the browser to see, and the browser will give priority to using the encoding information extracted from the header of the web page to decode the web page. Of course, we can also force the browser to use a certain encoding to interpret the web page, so that we can see the legendary garbled code. GBK, GB2312, etc. and UTF8 must be converted to each other through Unicode encoding: GBK, GB2312-Unicode-UTF8 For a website or forum, if there are many English characters, it is recommended to use UTF-8 to save space. However, many forum plug-ins now generally only support GBK. One benefit of using UTF-8 is that users in other regions (such as Hong Kong and Taiwan) can view your text normally without garbled characters without installing simplified Chinese support*. *
The most commonly used code in mainland China is GBK18030. In addition, there are GBK and GB2312. The relationship between these codes is as follows. The earliest Chinese character code was GB2312, which included 6763 Chinese characters and 682 other symbols. The code was revised in 1995 and named GBK1.0, which included a total of 21,886 symbols. Later, the GBK18030 encoding was introduced, which included a total of 27,484 Chinese characters, as well as major minority languages such as Tibetan, Mongolian, and Uyghur. Now the WINDOWS platform must support the GBK18030 encoding. GB2312 code contains about 6000 Chinese characters (excluding special characters), the encoding range is b0-f7 for the first digit, and a1-fe for the second digit (when the first digit is cf, the second digit is a1-d3). Calculating the number of Chinese characters, it is 6762. Of course there are other characters. Including control keys and other characters, there are about 7573 character codes. The GBK code is an expansion of the GB2312 code, which accommodates more Chinese characters, but it is just an expansion, without any qualitative change. All GB2312 codes are retained, and the code range is expanded on this basis. A total of 22014 character codes (including special characters) are accommodated. The gb18030 code is an expansion based on the gbk code. Because there are more Chinese characters, using only two-bit codes can no longer accommodate the required Chinese characters, so a 2\4-bit mixed method is adopted to support more Chinese character codes. And it retains the original gbk 2-byte encoding, which is compatible with GB2312 and gbk encoded files. It can accommodate approximately 55,657 codes (including special characters). Unicode code (also known as UTF code): commonly known as the universal code, it is committed to using unified coding standards to express the texts of various countries. In order to express more text, UTF-8 uses a 2/3 mixed encoding method. The range of Chinese characters currently accommodated is smaller than that of gbk encoding. And processing Chinese in 3-byte mode brings compatibility issues. The original gbk, GB2312, and gb18030 encoding files cannot be processed normally. There is still a long way to go. What are the differences between gbk and gb2312 First of all, everyone needs to understand what is GBK? What is GB2312? We need to know that they are all a kind of character encoding, of course there are many kinds of character encoding. We can understand character encoding as follows: Computers store binary values of 0 and 1. 8 bits correspond to one byte, which is usually expressed in hexadecimal. So how can we achieve this if we want to see the characters we want displayed on the computer instead of various numbers 0 and 1? Here we need to make the computer convert the corresponding hexadecimal values it stores into corresponding characters, including characters in other languages such as English and Chinese, and then output them to the screen. So encoding means defining a set of rules to specify which values correspond to which characters. Then character encoding defines a set of rules that specify which value among the many values stored in the computer corresponds to which letter displayed on the computer screen. To sum up, everyone should understand that GBK and GB2312 are a kind of character encoding. Let's talk about their differences and similarities in detail below: Similarities: 1. GBK and GB2312 are both 16 bits! 2. They are usually used in the meta tags of web pages. Differences: 1. GBK character encoding supports Simplified Chinese and Traditional Chinese! GBK stands for "Chinese Internal Code Extension Specification" (GBK means the first letter of "national standard" and "extension" of Chinese pinyin, and its English name is Chinese Internal Code Specification). It was formulated by the National Technical Committee of Information Technology Standardization of the People's Republic of China on December 1, 1995. The Standardization Department of the State Administration of Technical Supervision and the Science and Technology and Quality Supervision Department of the Ministry of Electronics Industry jointly identified it as a technical specification guiding document in the form of the document No. 229 of Technical Supervision Letter 1995 on December 15, 1995. 2. GB2312 only supports Simplified Chinese! "Chinese Character Coded Character Set for Information Interchange" is a set of national standards issued by the General Administration of Standards of China in 1980 and implemented on May 1, 1981. The standard number is GB 2312-1980. If your web pages are mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume is small and there are some advantages. If your web page is to be viewed by the world, and you use GB2312 and GBK as the web page encoding, some browsers on computers do not have this encoding, and the Chinese characters on your web page will become unrecognizable garbled characters. |
<<: Detailed explanation of CSS pre-compiled languages and their differences
>>: Detailed explanation of the payment function code of the Vue project
In order to extend the disk life for storing audi...
xml <?xml version="1.0" encoding=&qu...
Table of contents 1. What is componentization? 2....
Table of contents defineComponent overload functi...
ps: Here is how to disable remote login of root a...
Red and pink, and their hexadecimal codes. #99003...
Quick solution for forgetting MYSQL database pass...
This article shares the manual installation tutor...
At present, most people who use Linux either use ...
Copy code The code is as follows: jQuery.cookie =...
Table of contents 01 Create invisible columns 02 ...
Here is a text hovering and jumping effect implem...
What is an inode? To understand inode, we must st...
Table of contents 1. Quickly recognize the concep...
Here is a brief summary of the installation and c...