Summary of MySQL character sets

Summary of MySQL character sets

This note mainly records the learning of MySQL character set related knowledge

Character Set

Generally, we use the following statement to create a table, and specify charset as utf8 encoding when creating the table.

CREATE TABLE `test` (
 `c1` int UNSIGNED NOT NULL AUTO_INCREMENT,
 `c2` varchar(255),
 `c3` char(11),
 `c4` varchar(255),
 PRIMARY KEY (`c1`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

But in fact, MySQL also supports character sets other than utf8. You can use the command show charset to view all character sets currently supported by MySQL.

For example, the following figure shows all the character sets supported by my MySQL.

As can be seen from the figure above, the fourth column indicates the maximum length of the character set. The maximum length of utf8, which we often use when creating tables, in MySQL is actually 3 characters. It is generally believed that utf8 uses 1 to 4 characters to represent specific content, but utf8 in MySQL is actually an alias for utf8mb3. The authentic one that uses up to 4 characters in MySQL is utf8mb4.

Notice:

utf8mb4
utf8mb4

Comparison Rules

In the above picture, the third column Default collation is the default comparison rule of these character sets. For example, the default comparison rule for the acsii character set is ascii_general_ci, and the default comparison rule for the utf8 character set is utf8_general_ci.

In fact, each character set has many comparison rules. For example, let's take a look at the comparison rules of the acsii character set, as shown in the following figure

mysql> show collation like 'ascii%';
+------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+------------------+---------+----+---------+----------+---------+
| ascii_general_ci | ascii | 11 | Yes | | 0 |
| ascii_bin | ascii | 65 | | | 0 |
+------------------+---------+----+---------+----------+---------+

ascii_general_ci indicates a general comparison, where _ci means case insensitive. (On the contrary, _cs means case-sensitive)

ascii_bin means binary comparison

Let's take a look at the comparison rules of utf8 and utf8mb4 character sets, as shown in the following figure

mysql> show collation like 'utf8%';
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
| utf8_german2_ci | utf8 | 212 | | Yes | 8 |
| utf8_croatian_ci | utf8 | 213 | | Yes | 8 |
| utf8_unicode_520_ci | utf8 | 214 | | Yes | 8 |
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 |
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
| utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
| utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
| utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
| utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
| utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
| utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
| utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
| utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
| utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
| utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
| utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
| utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
| utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
| utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
| utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
| utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
| utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
| utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
| utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
+--------------------------+---------+-----+---------+----------+---------+

Using different comparison rules to compare data may result in different results.

The character set and comparison rules correspond to each other. No matter you modify only the character set or only the comparison rules, the other will automatically change accordingly.

Four levels of character sets and comparison rules

The character set and comparison rules are mentioned above. In fact, there are different levels of character sets and comparison rules in MySQL:

  • Server Level
  • Database level
  • Table Level
  • Column level

For server-level character sets, see the parameter character_set_server , and for server-level comparison rules, see the parameter collation_server

mysql> show variables like 'character_set_server';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_server';
+------------------+-----------------+
| Variable_name | Value |
+------------------+-----------------+
| collation_server | utf8_general_ci |
+------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules at the database level, you can first use xxxdb to switch to a specific database, and then execute show variables like 'character_set_database' and show variables like 'collation_database' to view the character set and comparison rules of the database.

Database changed
mysql> show variables like 'character_set_database';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)

mysql> show variables like 'collation_database';
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

If you want to view the character set and comparison rules of a table or a column in a table, you can use show create table xxxtb to view the specific table creation statement, which contains the character set and comparison rules used.

If the character set and comparison rules are not specified when the database is created, the server-level character set and comparison rules will be used as the database's character set and comparison rules. If the character set and comparison rules are not specified in the statement for creating or modifying a table, the character set and comparison rules of the database where the table is located will be used as the character set and comparison rules of the table.

Three system variables

The MySQL server can correspond to multiple clients, and the encoding methods of these clients may be different, but the character set of the data in the database is certain. MySQL must perform encoding conversion processing for different clients and then return the results to the client.

In MySQL, there are three SESSION-level system variables that can perform the above operations:

  • character_set_client : The character set used by the server to decode client requests
  • character_set_connection: When the server processes the request, it converts the request string from character_set_client to character_set_connection
  • character_set_results : The character set used by the server when returning data to the client

The MySQL server maintains a copy of these three variables at the session level for each client.

However, in actual production environments, in order to avoid trouble, companies generally have regulations that require clients to use the utf8 character set by default.

The above is the detailed summary of MySQL character set. For more information about MySQL character set, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Practical tutorial on modifying MySQL character set
  • Detailed explanation of MYSQL character set setting method (terminal character set)
  • Detailed explanation of character sets and validation rules in MySQL
  • MySQL character set viewing and modification tutorial
  • How to modify the MySQL character set
  • Causes and solutions to the garbled character set problem in MySQL database
  • How to change MySQL character set utf8 to utf8mb4
  • How to unify the character set on an existing mysql database
  • MySQL character set garbled characters and solutions
  • Detailed explanation of JDBC's processing of Mysql utf8mb4 character set
  • How to change the default character set of MySQL to utf8 on MAC

<<:  Example code for implementing equal height layout in multiple ways with CSS

>>:  Solution to the failure of entering the container due to full docker space

Recommend

Understanding flex-grow, flex-shrink, flex-basis and nine-grid layout

1. flex-grow, flex-shrink, flex-basis properties ...

Pricing table implemented with CSS3

Result: Implementation Code html <div id="...

Nginx Layer 4 Load Balancing Configuration Guide

1. Introduction to Layer 4 Load Balancing What is...

Introduction to Vue3 Composition API

Table of contents Overview Example Why is it need...

Detailed explanation on how to modify the default port of nginx

First find out where the configuration file is wh...

Nginx configuration location matching rules example explanation

The scope of nginx configuration instructions can...

How to view and modify the time zone in MySQL

Today I found that a program inserted an incorrec...

Interpretation of 17 advertising effectiveness measures

1. 85% of ads go unread <br />Interpretatio...

UCenter Home site adds statistics code

UCenter Home is an SNS website building system rel...

Detailed explanation of flex layout in CSS

Flex layout is also called elastic layout. Any co...

js to achieve simple magnifying glass effects

This article example shares the specific code of ...

Vue keeps the user logged in (various token storage methods)

Table of contents How to set cookies Disadvantage...