Summary of MySQL character sets

Summary of MySQL character sets

This note mainly records the learning of MySQL character set related knowledge

Character Set

Generally, we use the following statement to create a table, and specify charset as utf8 encoding when creating the table.

CREATE TABLE `test` (
 `c1` int UNSIGNED NOT NULL AUTO_INCREMENT,
 `c2` varchar(255),
 `c3` char(11),
 `c4` varchar(255),
 PRIMARY KEY (`c1`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

But in fact, MySQL also supports character sets other than utf8. You can use the command show charset to view all character sets currently supported by MySQL.

For example, the following figure shows all the character sets supported by my MySQL.

As can be seen from the figure above, the fourth column indicates the maximum length of the character set. The maximum length of utf8, which we often use when creating tables, in MySQL is actually 3 characters. It is generally believed that utf8 uses 1 to 4 characters to represent specific content, but utf8 in MySQL is actually an alias for utf8mb3. The authentic one that uses up to 4 characters in MySQL is utf8mb4.

Notice:

utf8mb4
utf8mb4

Comparison Rules

In the above picture, the third column Default collation is the default comparison rule of these character sets. For example, the default comparison rule for the acsii character set is ascii_general_ci, and the default comparison rule for the utf8 character set is utf8_general_ci.

In fact, each character set has many comparison rules. For example, let's take a look at the comparison rules of the acsii character set, as shown in the following figure

mysql> show collation like 'ascii%';
+------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------+---------+----+---------+----------+---------+
| ascii_general_ci | ascii | 11 | Yes | | 0 |
| ascii_bin | ascii | 65 | | | 0 |
+------------------+---------+----+---------+----------+---------+

ascii_general_ci indicates a general comparison, where _ci means case insensitive. (On the contrary, _cs means case-sensitive)

ascii_bin means binary comparison

Let's take a look at the comparison rules of utf8 and utf8mb4 character sets, as shown in the following figure

mysql> show collation like 'utf8%';
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_german2_ci | utf8 | 212 | | Yes | 8 |
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 |
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+--------------------------+---------+-----+---------+----------+---------+

Using different comparison rules to compare data may result in different results.

The character set and comparison rules correspond to each other. No matter you modify only the character set or only the comparison rules, the other will automatically change accordingly.

Four levels of character sets and comparison rules

The character set and comparison rules are mentioned above. In fact, there are different levels of character sets and comparison rules in MySQL:

  • Server Level
  • Database level
  • Table Level
  • Column level

For server-level character sets, see the parameter character_set_server , and for server-level comparison rules, see the parameter collation_server

mysql> show variables like 'character_set_server';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_server';
+------------------+-----------------+
| Variable_name | Value |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules at the database level, you can first use xxxdb to switch to a specific database, and then execute show variables like 'character_set_database' and show variables like 'collation_database' to view the character set and comparison rules of the database.

Database changed
mysql> show variables like 'character_set_database';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_database';
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules of a table or a column in a table, you can use show create table xxxtb to view the specific table creation statement, which contains the character set and comparison rules used.

If the character set and comparison rules are not specified when the database is created, the server-level character set and comparison rules will be used as the database's character set and comparison rules. If the character set and comparison rules are not specified in the statement for creating or modifying a table, the character set and comparison rules of the database where the table is located will be used as the character set and comparison rules of the table.

Three system variables

The MySQL server can correspond to multiple clients, and the encoding methods of these clients may be different, but the character set of the data in the database is certain. MySQL must perform encoding conversion processing for different clients and then return the results to the client.

In MySQL, there are three SESSION-level system variables that can perform the above operations:

  • character_set_client : The character set used by the server to decode client requests
  • character_set_connection: When the server processes the request, it converts the request string from character_set_client to character_set_connection
  • character_set_results : The character set used by the server when returning data to the client

The MySQL server maintains a copy of these three variables at the session level for each client.

However, in actual production environments, in order to avoid trouble, companies generally have regulations that require clients to use the utf8 character set by default.

The above is the detailed summary of MySQL character set. For more information about MySQL character set, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Practical tutorial on modifying MySQL character set
  • Detailed explanation of MYSQL character set setting method (terminal character set)
  • Detailed explanation of character sets and validation rules in MySQL
  • MySQL character set viewing and modification tutorial
  • How to modify the MySQL character set
  • Causes and solutions to the garbled character set problem in MySQL database
  • How to change MySQL character set utf8 to utf8mb4
  • How to unify the character set on an existing mysql database
  • MySQL character set garbled characters and solutions
  • Detailed explanation of JDBC's processing of Mysql utf8mb4 character set
  • How to change the default character set of MySQL to utf8 on MAC

<<:  Example code for implementing equal height layout in multiple ways with CSS

>>:  Solution to the failure of entering the container due to full docker space

Recommend

Detailed tutorial on installing CUDA9.0 on Ubuntu16.04

Preface: This article is based on the experience ...

MySql index improves query speed common methods code examples

Use indexes to speed up queries 1. Introduction I...

Linux completely removes node.js and reinstalls it through the yum command

first step Delete it once with the built-in packa...

Solve the problem of mysql data loss when docker restarts redis

Official documentation: So mysql should be starte...

Use of Linux ipcs command

1. Command Introduction The ipcs command is used ...

MySQL date processing function example analysis

This article mainly introduces the example analys...

How to install MySQL 5.7 on Ubuntu and configure the data storage path

1. Install MySQL This article is installed via AP...

Detailed explanation of redo log and undo log in MySQL

The most important logs in the MySQL log system a...

How to use resident nodes for layer management in CocosCreator

CocosCreator version: 2.3.4 Most games have layer...

Sitemesh tutorial - page decoration technology principles and applications

1. Basic Concepts 1. Sitemesh is a page decoratio...

vue-cropper component realizes image cutting and uploading

This article shares the specific code of the vue-...