Summary of MySQL character sets

Summary of MySQL character sets

This note mainly records the learning of MySQL character set related knowledge

Character Set

Generally, we use the following statement to create a table, and specify charset as utf8 encoding when creating the table.

CREATE TABLE `test` (
 `c1` int UNSIGNED NOT NULL AUTO_INCREMENT,
 `c2` varchar(255),
 `c3` char(11),
 `c4` varchar(255),
 PRIMARY KEY (`c1`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

But in fact, MySQL also supports character sets other than utf8. You can use the command show charset to view all character sets currently supported by MySQL.

For example, the following figure shows all the character sets supported by my MySQL.

As can be seen from the figure above, the fourth column indicates the maximum length of the character set. The maximum length of utf8, which we often use when creating tables, in MySQL is actually 3 characters. It is generally believed that utf8 uses 1 to 4 characters to represent specific content, but utf8 in MySQL is actually an alias for utf8mb3. The authentic one that uses up to 4 characters in MySQL is utf8mb4.

Notice:

utf8mb4
utf8mb4

Comparison Rules

In the above picture, the third column Default collation is the default comparison rule of these character sets. For example, the default comparison rule for the acsii character set is ascii_general_ci, and the default comparison rule for the utf8 character set is utf8_general_ci.

In fact, each character set has many comparison rules. For example, let's take a look at the comparison rules of the acsii character set, as shown in the following figure

mysql> show collation like 'ascii%';
+------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------+---------+----+---------+----------+---------+
| ascii_general_ci | ascii | 11 | Yes | | 0 |
| ascii_bin | ascii | 65 | | | 0 |
+------------------+---------+----+---------+----------+---------+

ascii_general_ci indicates a general comparison, where _ci means case insensitive. (On the contrary, _cs means case-sensitive)

ascii_bin means binary comparison

Let's take a look at the comparison rules of utf8 and utf8mb4 character sets, as shown in the following figure

mysql> show collation like 'utf8%';
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_german2_ci | utf8 | 212 | | Yes | 8 |
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 |
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+--------------------------+---------+-----+---------+----------+---------+

Using different comparison rules to compare data may result in different results.

The character set and comparison rules correspond to each other. No matter you modify only the character set or only the comparison rules, the other will automatically change accordingly.

Four levels of character sets and comparison rules

The character set and comparison rules are mentioned above. In fact, there are different levels of character sets and comparison rules in MySQL:

  • Server Level
  • Database level
  • Table Level
  • Column level

For server-level character sets, see the parameter character_set_server , and for server-level comparison rules, see the parameter collation_server

mysql> show variables like 'character_set_server';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_server';
+------------------+-----------------+
| Variable_name | Value |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules at the database level, you can first use xxxdb to switch to a specific database, and then execute show variables like 'character_set_database' and show variables like 'collation_database' to view the character set and comparison rules of the database.

Database changed
mysql> show variables like 'character_set_database';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_database';
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules of a table or a column in a table, you can use show create table xxxtb to view the specific table creation statement, which contains the character set and comparison rules used.

If the character set and comparison rules are not specified when the database is created, the server-level character set and comparison rules will be used as the database's character set and comparison rules. If the character set and comparison rules are not specified in the statement for creating or modifying a table, the character set and comparison rules of the database where the table is located will be used as the character set and comparison rules of the table.

Three system variables

The MySQL server can correspond to multiple clients, and the encoding methods of these clients may be different, but the character set of the data in the database is certain. MySQL must perform encoding conversion processing for different clients and then return the results to the client.

In MySQL, there are three SESSION-level system variables that can perform the above operations:

  • character_set_client : The character set used by the server to decode client requests
  • character_set_connection: When the server processes the request, it converts the request string from character_set_client to character_set_connection
  • character_set_results : The character set used by the server when returning data to the client

The MySQL server maintains a copy of these three variables at the session level for each client.

However, in actual production environments, in order to avoid trouble, companies generally have regulations that require clients to use the utf8 character set by default.

The above is the detailed summary of MySQL character set. For more information about MySQL character set, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Practical tutorial on modifying MySQL character set
  • Detailed explanation of MYSQL character set setting method (terminal character set)
  • Detailed explanation of character sets and validation rules in MySQL
  • MySQL character set viewing and modification tutorial
  • How to modify the MySQL character set
  • Causes and solutions to the garbled character set problem in MySQL database
  • How to change MySQL character set utf8 to utf8mb4
  • How to unify the character set on an existing mysql database
  • MySQL character set garbled characters and solutions
  • Detailed explanation of JDBC's processing of Mysql utf8mb4 character set
  • How to change the default character set of MySQL to utf8 on MAC

<<:  Example code for implementing equal height layout in multiple ways with CSS

>>:  Solution to the failure of entering the container due to full docker space

Recommend

Detailed process of installing various software in Docker under Windows

1. Install MySQL # Download mysql in docker docke...

mysql IS NULL using index case explanation

Introduction The use of is null, is not null, and...

A brief discussion on creating cluster in nodejs

Table of contents cluster Cluster Details Events ...

Minio lightweight object storage service installation and browser usage tutorial

Table of contents Introduction Install 1. Create ...

A Brief Analysis of MySQL PHP Syntax

Let's first look at the basic syntax of the c...

How to design the homepage of Tudou.com

<br />I have been working in front-end for s...

Detailed explanation of MySQL partition table

Preface: Partitioning is a table design pattern. ...

MySQL helps you understand index pushdown in seconds

Table of contents 1. The principle of index push-...

Vue batch update dom implementation steps

Table of contents Scene Introduction Deep respons...

CentOS 7 method to modify the gateway and configure the IP example

When installing the centos7 version, choose to co...

A record of the pitfalls of the WeChat applet component life cycle

The component lifecycle is usually where our busi...

How to use IDEA to create a web project and publish it to tomcat

Table of contents Web Development 1. Overview of ...

How to prevent duplicate submission in jquery project

In new projects, axios can prevent duplicate subm...

Interpretation of the module for load balancing using nginx

Table of contents Two modules for using nginx for...

How to install Docker and configure Alibaba Cloud Image Accelerator

Docker Installation There is no need to talk abou...