Take you to understand MySQL character set settings in 5 minutes

Take you to understand MySQL character set settings in 5 minutes

1. Content Overview

When using MySQL, it is very important to understand the concepts of character sets and character sequences, as well as the impact of different settings on data storage and comparison. The "garbled characters" problem encountered by many students in their daily work is most likely caused by insufficient understanding of character sets and character sequences, as well as incorrect settings.

This article introduces the following contents from the basics to the complex ones:

  1. Basic concepts and connections between character sets and character sequences
  2. Character sets and character order settings supported by MySQL, and the relationship between each setting level
  3. View and set server, database, table, column-level character sets and character sequences
  4. When should I set the character set and character sequence?

2. Concepts and Relationships of Character Sets and Character Orders

When storing data, MySQL provides support for different character sets. When comparing data, different character order supports are provided.

MySQL provides settings at different levels, including server level, database level, table level, and column level, which can provide very precise settings.

What are character sets and character sequences? In simple terms:

  1. Character set: defines characters and character encodings.
  2. Collation: defines the comparison rules for characters.

For example:

There are four characters: A, B, a, b. The codes of these four characters are A = 0, B = 1, a = 2, b = 3 respectively. The characters + encoding here constitute the character set.

What if we want to compare the size of two characters? For example, A and B, or a and b, the most intuitive way to compare is to use their encoding, for example, because 0 < 1, so A < B.

In addition, for A and a, although they have different encodings, we think that uppercase and lowercase characters should be equal, that is, A == a.

Two comparison rules are defined above, and the set of these comparison rules is collation.

  1. If they are both uppercase and lowercase characters, their encoding sizes are compared;
  2. If two characters are of the same case, they are equal.

3. Character sets and character sequences supported by MySQL

MySQL supports multiple character sets and character sequences.

  1. A character set corresponds to at least one character sequence (usually one to many).
  2. Two different character sets cannot have the same character sequence.
  3. Each character set has a default character order.

The above is rather abstract. Let’s look at the next few sections to understand what’s going on.

1. View supported character sets

You can view the character sets supported by MYSQL in the following ways.

Method 1:

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
...omitted

Method 2:

mysql> use information_schema;
mysql> select * from CHARACTER_SETS;
+--------------------+----------------------+-----------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+-----------------------------+--------+
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 |
| dec8 | dec8_swedish_ci | DEC West European | 1 |
...omitted

When using SHOW CHARACTER SET to view, you can also add WHERE or LIKE restrictions.

Example 1: Using the WHERE condition.

mysql> SHOW CHARACTER SET WHERE Charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)

Example 2: Use the LIKE condition.

mysql> SHOW CHARACTER SET LIKE "utf8%";
+---------+---------------+--------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+--------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.00 sec)

2. View the supported character sequences

Similarly, you can view the character sequences supported by MYSQL in the following way.

Method 1: View through SHOW COLLATION.

As you can see, the utf8 character set has more than 10 character sequences. Check whether the default character sequence is used by checking whether the value of Default is Yes.

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
...slightly

Method 2: Query information_schema.COLLATIONS.

mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |

3. Naming conventions for character sequences

The character sequences are named with their corresponding character set as prefix, as shown below. For example, the character sequence utf8_general_ci indicates that it is the character sequence of the character set utf8.

For more rules, please refer to the official documentation.

MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; 
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
| utf8 | utf8_bin |
+--------------------+-----------------+
2 rows in set (0.00 sec)

4. Server character set and character sequence

Purpose: When you create a database and do not specify a character set or character order, the server character set and server character order will be used as the default character set and sorting rule for the database.

How to specify: You can specify it through command line parameters when starting the MySQL service. It can also be specified via variables in the configuration file.

Server default character set and character order: specified through compilation parameters when MySQL is compiled.

character_set_server and collation_server correspond to the server character set and server character sequence respectively.

1. Check the server character set and character sequence

They correspond to the character_set_server and collation_server system variables respectively.

mysql> SHOW VARIABLES LIKE "character_set_server";
mysql> SHOW VARIABLES LIKE "collation_server";

2. Specify when starting the service

You can specify the server character set and character sequence when starting the MySQL service. If not specified, the default character order is latin1, latin1_swedish_ci

mysqld --character-set-server=latin1 \
 --collation-server=latin1_swedish_ci

Specify the server character set separately. In this case, the server character sequence is the default character sequence of latin1, latin1_swedish_ci.

mysqld --character-set-server=latin1

3. Configuration file specification

In addition to specifying it in the command line parameters, you can also specify it in the configuration file, as shown below.

 [client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

4. Runtime modification

Example: Modification during runtime (will become invalid after restart. If you want to keep it unchanged after restart, you need to write it into the configuration file)

mysql> SET character_set_server = utf8;

5. Specify the default character set and character sequence when compiling

The default values ​​of character_set_server and collation_server can be specified through compilation options when MySQL is compiled:

cmake . -DDEFAULT_CHARSET=latin1 \
  -DDEFAULT_COLLATION=latin1_german1_ci

5. Database character set and character sequence

Purpose: To specify the character set and character sequence at the database level. Databases under the same MySQL service can specify different character sets/character sequences.

1. Set the character set/character sequence of the data

When creating or modifying a database, you can specify the character set and collation of the database through CHARACTER SET and COLLATE.

Create the database:

CREATE DATABASE db_name
 [[DEFAULT] CHARACTER SET charset_name]
 [[DEFAULT] COLLATE collation_name]

Modify the database:

ALTER DATABASE db_name
 [[DEFAULT] CHARACTER SET charset_name]
 [[DEFAULT] COLLATE collation_name]

Example: Create a database test_schema and set the character set to utf8. The default collation is utf8_general_ci.

CREATE DATABASE `test_schema` DEFAULT CHARACTER SET utf8;

2. Check the character set/character sequence of the database

There are 3 ways to view the character set/character order of the database.

Example 1: Check the character set and collation of test_schema. (Need to switch the default database)

mysql> use test_schema;
Database changed
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8 | utf8_general_ci |
+--------------------------+----------------------+
1 row in set (0.00 sec)

Example 2: You can also use the following command to view the character set and database of test_schema (no need to switch the default database)

mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name="test_schema";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test_schema | utf8 | utf8_general_ci |
+-------------+----------------------------+------------------------+
1 row in set (0.00 sec)

Example 3: You can also view the character set by viewing the statement for creating the database.

mysql> SHOW CREATE DATABASE test_schema;
+-------------+---------------------------------------------------------------------+
| Database | Create Database |
+-------------+---------------------------------------------------------------------+
| test_schema | CREATE DATABASE `test_schema` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+-------------+---------------------------------------------------------------------+
1 row in set (0.00 sec)

3. How are database character sets and character sequences determined?

When creating a database, if CHARACTER SET or COLLATE is specified, the corresponding character set and collation will prevail.
When creating a database, if the character set and collation rules are not specified, character_set_server and collation_server will prevail.

6. Table character set and character sequence

The syntax for creating and modifying a table is as follows. The character set and character sequence can be set through CHARACTER SET and COLLATE.

CREATE TABLE tbl_name (column_list)
 [[DEFAULT] CHARACTER SET charset_name]
 [COLLATE collation_name]]

ALTER TABLE tbl_name
 [[DEFAULT] CHARACTER SET charset_name]
 [COLLATE collation_name]

1. Create a table and specify the character set/character sequence

The following example specifies the character set as utf8 and the character sequence as the default.

CREATE TABLE `test_schema`.`test_table` (
 `id` INT NOT NULL COMMENT '',
 PRIMARY KEY (`id`) COMMENT '')
DEFAULT CHARACTER SET = utf8;

2. Check the character set/character sequence of the table

Similarly, there are 3 ways to view the character set/character order of a table.

Method 1: Use SHOW TABLE STATUS to view the table status. Note that the Collation is utf8_general_ci and the corresponding character set is utf8.

MariaDB [blog]> SHOW TABLE STATUS FROM test_schema \G;
*************************** 1. row ***************************
  Name: test_table
  Engine: InnoDB
 Version: 10
 Row_format: Compact
  Rows: 0
 Avg_row_length: 0
 Data_length: 16384
Max_data_length: 0
 Index_length: 0
 Data_free: 11534336
 Auto_increment: NULL
 Create_time: 2018-01-09 16:10:42
 Update_time: NULL
 Check_time: NULL
 Collation: utf8_general_ci
 Checksum: NULL
 Create_options: 
 Comment: 
1 row in set (0.00 sec)

Method 2: View the information of information_schema.TABLES.

mysql> USE test_schema;
mysql> SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "test_schema" AND TABLE_NAME = "test_table";
+-----------------+
| TABLE_COLLATION |
+-----------------+
|utf8_general_ci|
+-----------------+

Method 3: Confirm through SHOW CREATE TABLE.

mysql> SHOW CREATE TABLE test_table;
+------------+----------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------+
| test_table | CREATE TABLE `test_table` (
 `id` int(11) NOT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

3. How to determine the table character set and character sequence

Assume that the values ​​of CHARACTER SET and COLLATE are charset_name and collation_name respectively. If you create a table:

If charset_name and collation_name are specified, use them.
If only charset_name is specified but collation_name is not, the character set uses charset_name and the character sequence uses the default character sequence corresponding to charset_name.

If only collation_name is specified but charset_name is not, the character sequence uses collation_name and the character set uses the character set associated with collation_name.

If charset_name and collation_name are not specified, the character set and collation settings of the database are used.

7. Column character set and sorting

For columns of type CHAR, VARCHAR, or TEXT, you can specify a character set or character sequence. The syntax is as follows:

col_name {CHAR | VARCHAR | TEXT} (col_length)
 [CHARACTER SET charset_name]
 [COLLATE collation_name]

1. Add a new column and specify the character set/collation rules

The example is as follows: (similar to creating a table)

mysql> ALTER TABLE test_table ADD COLUMN char_column VARCHAR(25) CHARACTER SET utf8;

2. Check the character set/character sequence of the column

Here are some examples:

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA="test_schema" AND TABLE_NAME="test_table" AND COLUMN_NAME="char_column";
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

3. Determine the column character set/collation rules

Assume that the values ​​of CHARACTER SET and COLLATE are charset_name and collation_name respectively:

If charset_name and collation_name are both clear, the character set and character order are based on charset_name and collation_name.

If only charset_name is specified and collation_name is not specified, the character set is charset_name and the character sequence is the default character sequence of charset_name.

If only collation_name is specified and charset_name is not specified, the character sequence is the collation_name and the character set is the character set associated with collation_name.

If charset_name and collation_name are not specified, the character set and character sequence of the table shall prevail.

8. Choice: When to set character set and character sequence

Generally speaking, it can be configured in three places:

Configured when creating a database.

The mysql server is configured when it is started.

When compiling MySQL from source code, configure it through compilation parameters

1. Method 1: Configure when creating a database

This method is more flexible and safer, and it does not depend on the default character set/character sequence. When you create a database, you specify the character set/character order. When you create tables and columns later, if you do not specify otherwise, the character set/character order of the corresponding database will be inherited.

CREATE DATABASE mydb
 DEFAULT CHARACTER SET utf8
 DEFAULT COLLATE utf8_general_ci;

2. Method 2: Configure when MySQL server starts

You can add the following configuration so that when MySQL server starts, character-set-server and collation-server will be configured.

When you create a database/table/column through the mysql client and do not explicitly declare the character set/character order, the character-set-server/collation-server will be used as the default character set/character order.

In addition, the character set/character order when the client and server connect still needs to be set through SET NAMES.

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

3. Method 3: When compiling MySQL from source code, set it through compilation parameters

If -DDEFAULT_CHARSET and -DDEFAULT_COLLATION are specified during compilation, then:

When creating a database or table, it will be used as the default character set/character sequence.

When the client connects to the server, it uses it as the default character set/character order. (No need to set NAMES separately)

shell> cmake . -DDEFAULT_CHARSET=utf8 \
  -DDEFAULT_COLLATION=utf8_general_ci

9. Write at the end

This article introduces the character set and character sequence related content in MySQL in detail. This part mainly focuses on the storage and comparison of data. In fact, there is still a very important part that has not been covered: the character set and character sequence settings for the connection.

There are also many garbled code problems caused by improper setting of the connection character set and character order. There is a lot of content to expand on this part, so it will be explained in the next article.

10. Related links

10.1 Character Set Support

https://dev.mysql.com/doc/refman/5.7/en/charset.html

Summarize

The above is what I introduced to you. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!

You may also be interested in:
  • In-depth analysis of Mysql character set settings
  • mysql odbc character set setting (Chinese characters are displayed in garbled characters)
  • In-depth Mysql character set settings [Essence combination]
  • Mysql default character set setting method (free installation version)
  • Regarding the MySQL character set, character_set_client=binary is set. In the case of GBK, the table description will be garbled.
  • In-depth MySQL character set settings graphic version
  • How to set the character set for mysql under Docker
  • Detailed explanation of character set settings based on MySQL 5.5
  • mysql commonly used settings character set encoding, auto-completion (auto prompt), monitor external network ip
  • Detailed explanation of MYSQL character set setting method (terminal character set)

<<:  How to use Nginx proxy to surf the Internet

>>:  In-depth understanding of Worker threads in Node.js

Recommend

Detailed explanation of query examples within subqueries in MySql

Where is my hometown when I look northwest? How m...

HTML thead tag definition and usage detailed introduction

Copy code The code is as follows: <thead> &...

How to filter out duplicate data when inserting large amounts of data into MySQL

Table of contents 1. Discover the problem 2. Dele...

Introduction to container data volumes in Docker

Table of contents Docker container data volume Us...

Installation and configuration method of vue-route routing management

introduce Vue Router is the official routing mana...

How to configure nginx to return text or json

Sometimes when requesting certain interfaces, you...

Docker-compose installation db2 database operation

It is troublesome to install the db2 database dir...

How to import, register and use components in batches in Vue

Preface Components are something we use very ofte...

VMware12.0 installation Ubuntu14.04 LTS tutorial

I have installed various images under virtual mac...

Specific steps to use vant framework in WeChat applet

Table of contents 1. Open the project directory o...

Detailed explanation on how to install MySQL database on Alibaba Cloud Server

Preface Since I needed to install Zookeeper durin...

HTML table tag tutorial (25): vertical alignment attribute VALIGN

In the vertical direction, you can set the row al...

Linux kernel device driver virtual file system notes

/******************** * Virtual File System VFS *...