1Several common character sets In MySQL, the most common character sets include ASCII character set, latin character set, GB2312 character set, GBK character set, UTF8 character set, etc. Let us briefly introduce these character sets: ASCII character set This character set uses 1 byte for encoding. One byte has 8 bits and can store a total of 128 characters. The specific correspondence is as follows: Latin character set The Latin character set can store a total of 256 characters. Compared with the ASCII code, it contains 128 commonly used characters in Western Europe. GB2312 character set It includes Chinese characters, Latin characters, Greek characters, etc., among which Chinese characters account for the majority, with 6763, and other text symbols are 638, and it is compatible with ASCII characters. When encoding ASCII, it uses 1 byte, which is 128 bits, and when encoding other characters, it uses 2 bytes. It can be understood that it is a side length encoding method. GBK character set This character set is an extension of the GB2312 character set. It is compatible with the GB2312 character set and is generally encoded using two bytes. UTF8 character set It is usually encoded by 1 to 4 bytes. According to the different bytes used, it can also be divided into UTF8 and utf8mb4. mb4 means up to 4 bytes. Generally speaking, UTF8 uses three bytes for encoding. In addition, there are utf16 and utf32. Utf16 uses 2 or 4 bytes to encode a character, and utf32 uses 4 bytes to encode a character. It should be noted that some emoji expressions need to be represented using utf8mb4. 2 Character sets supported in MySQL MySQL supports many character sets. Taking my local character set as an example, I use the show charset command to view the character sets supported by the current server. The results are as follows: [email protected]:(none) 22:46:48>>show charset; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ 39 rows in set (0.00 sec) As you can see, a total of 39 character sets are supported. Of course, this number varies in different versions. Among these 39 character sets, the main ones we need to remember are the following: +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | +----------+-----------------------------+---------------------+--------+ Here, we only need to know how many bytes their maximum length contains. 3 Verification rules The validation rule of a character set refers to the criterion used when comparing the size of a character set. For example, if we compare the size of a and B, if we do not consider the case, then a<B. If we consider the case, then a>B. In other words, the same character set but different comparison rules will produce different sorting results for a column of data. The character validation rules in MySQL can be viewed using the show collation; syntax, as follows: [email protected]:(none) 23:00:36>>show collation; +-----------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +-----------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | | dec8_swedish_ci | dec8 | 3 | Yes | Yes | 1 | | dec8_bin | dec8 | 69 | | Yes | 1 | ........... | koi8r_general_ci | koi8r | 7 | Yes | Yes | 1 | | koi8r_bin | koi8r | 74 | | Yes | 1 | | latin1_german1_ci | latin1 | 5 | | Yes | 1 | | koi8u_general_ci | koi8u | 22 | Yes | Yes | 1 | | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | | utf8_unicode_ci | utf8 | 192 | | Yes | 8 | | utf8_icelandic_ci | utf8 | 193 | | Yes | 8 | | utf8_latvian_ci | utf8 | 194 | | Yes | 8 | | utf8_romanian_ci | utf8 | 195 | | Yes | 8 | | utf8_slovenian_ci | utf8 | 196 | | Yes | 8 | | utf8_polish_ci | utf8 | 197 | | Yes | 8 | | utf8_estonian_ci | utf8 | 198 | | Yes | 8 | +-----------------------+----------+-----+---------+----------+---------+ 195 rows in set (0.00 sec) We can see that there are a total of 195 comparison rules in the results. Each character set contains its own default validation rule. Let us briefly explain one: utf8_polish_ci compares based on Polish. This validation rule consists of three parts. The comparison rule name starts with the name of the character set associated with it. utf8 refers to the comparison rule of the utf8 character set, polish refers to Polish, and _ci refers to case-insensitive. For the last suffix, we can summarize as follows: _ai is accent-insensitive _as is accent sensitive _ci case insensitive (insensitive) not case sensitive _cs case sensitive (sensitive) case sensitive _bin binary In the above results, we can also see that the value of the default column of some validation rules is yes, which means that this validation rule is the default validation rule for the character set. 4 Character sets and comparison rules on the server MySQL provides two system variables to represent the server-level character set and comparison rules, one is character_set_sever and the other is collation_server. We can view their default values: [email protected]:(none) 23:12:28>>show variables like 'character_set_server'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | character_set_server | utf8 | +----------------------+-------+ 1 row in set (0.00 sec) [email protected]:(none) 23:12:47>>show variables like 'collation_server'; +------------------+-----------------+ | Variable_name | Value | +------------------+-----------------+ | collation_server | utf8_general_ci | +------------------+-----------------+ 1 row in set (0.00 sec) [email protected]:(none) 23:12:57>> In MySQL, character sets and validation rules are divided into four levels: server level, database level, table level, and field level. Here we take examples to see them respectively: Database levelmysql 23:23:48>>create database yyz character set utf8 collate utf8_general_ci; Query OK, 1 row affected (0.01 sec) Table levelmysql 23:24:23>>create table yyz.yyz (name varchar(10)) charset gb2312 collate gb2312_chinese_ci; Query OK, 0 rows affected (0.04 sec) Field levelmysql 23:27:06>> [email protected]:(none) 23:27:06>>alter table yyz.yyz add num varchar(10) charset gbk collate gbk_chinese_ci; Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0 View the resultsmysql 23:28:27>>show create table yyz.yyz; +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | yyz | CREATE TABLE `yyz` ( `name` varchar(10) DEFAULT NULL, `num` varchar(10) CHARACTER SET gbk DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=gb2312 | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) Since the character set and comparison rules are related to each other, if we only modify the character set, the comparison rules will also change. If we only modify the comparison rules, the character set will also change. The specific rules are as follows: If you only modify the character set, the comparison rules will become the default comparison rules of the modified character set. There are 4 more things to remember:
The above is a detailed explanation of the character set and validation rules in MySQL. For more information about the character set and validation rules in MySQL, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
>>: What is dns-prefetch? Front-end optimization: DNS pre-resolution improves page speed
Table of contents Preface 1. Download a single fi...
principle The principle of anti-shake is: you can...
This article records the installation of MySQL 8....
In the past, I only knew how to use the name attri...
Table of contents Preliminary work Backend constr...
According to the principles of W3C, each start tag...
SQL implements addition, subtraction, multiplicat...
In fact, this problem has already popped up when I...
Table of contents Same Origin Policy Ajax request...
NAT In this way, the virtual machine's networ...
Table of contents 1. Overview 1.1 What is strict ...
Margin of parallel boxes (overlap of double margi...
This article example shares the specific code of ...
CSS plays a very important role in a web page. Wi...
Table of contents 1. Joint index description 2. C...