PrefaceExecute the show create table <tablename> command in MySQL to see the table creation statement for a table. The example is as follows: CREATE TABLE `table1` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `field1` text COLLATE utf8_unicode_ci NOT NULL COMMENT 'Field 1', `field2` varchar(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT '' COMMENT 'Field 2', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8_unicode_ci; We can understand most of the fields, but what we are going to discuss today is the COLLATE keyword. What does the utf8_unicode_ci corresponding to this value mean? Using this question to test DBAs during interviews will probably stump most people. What is COLLATE used for?Development using phpmyadmin may look very familiar, because the Chinese header has given the answer: The so-called utf8_unicode_ci is actually a rule used for sorting. For character type columns in MySQL, such as VARCHAR, CHAR, and TEXT type columns, a COLLATE type is required to tell MySQL how to sort and compare the column. In short, COLLATE affects the order of the ORDER BY statement, affects the results filtered out by the greater than or less than sign in the WHERE condition , and affects the query results of the DISTINCT, GROUP BY, and HAVING statements . In addition, when MySQL creates an index, if the index column is of character type, it will also affect the index creation, but we cannot perceive this impact. In short, wherever character type comparison or sorting is involved, it will be related to COLLATE . The difference between various COLLATECOLLATE is usually related to data encoding (CHARSET). Generally speaking, each CHARSET has multiple COLLATEs it supports, and each CHARSET specifies a COLLATE as the default value. For example, the default COLLATE for Latin1 encoding is latin1_swedish_ci, the default COLLATE for GBK encoding is gbk_chinese_ci, and the default COLLATE for utf8mb4 encoding is utf8mb4_general_ci. By the way, there are two encodings in MySQL: utf8 and utf8mb4. In MySQL, please forget **utf8** and always use **utf8mb4**. This is a legacy issue of MySQL. The utf8 in MySQL can only support character encodings with a maximum length of 3 bytes. For some text that needs to occupy 4 bytes, MySQL's utf8 does not support it, and utf8mb4 must be used. Many COLLATEs have the word _ci, which is the abbreviation of Case Insensitive, which means that "A" and "a" are treated equally when sorting and comparing. selection * from table1 where field1="a" can also select the value of field1 being "A". At the same time, for those COLLATEs with the _cs suffix, they are Case Sensitive, that is, case sensitive. Use the show collation command in mysql to view all the COLLATEs supported by mysql. Taking utf8mb4 as an example, all the COLLATEs supported by this encoding are shown in the figure below. All COLLATE related to utf8mb4 in imgmysql In the picture we can see the sorting rules of many countries' languages. The three most commonly used in the country are utf8mb4_general_ci (default), utf8mb4_unicode_ci, and utf8mb4_bin. Let’s explore the differences between these three: First of all, the comparison method of utf8mb4_bin is actually to directly treat all characters as binary strings, and then compare them from the highest bit to the lowest bit. So obviously it is case sensitive. There is actually no difference between utf8mb4_unicode_ci and utf8mb4_general_ci for Chinese and English. For the systems we develop for domestic use, any one of them will do. It’s just that for some Western countries’ letters, utf8mb4_unicode_ci is more in line with their language habits than utf8mb4_general_ci. General is an older standard of MySQL. For example, the German letter "ß" is equivalent to the two letters "ss" in utf8mb4_unicode_ci (this is in line with German custom), but in utf8mb4_general_ci, it is equivalent to the letter "s". However, the subtle differences between the two encodings are difficult to perceive for normal development. We seldom use text fields directly for sorting. Even if one or two letters are sorted incorrectly, can it really bring catastrophic consequences to the system? From various posts and discussions found on the Internet, more people recommend using utf8mb4_unicode_ci, but they are not very opposed to systems using the default values, and do not think there are any big problems. Conclusion: It is recommended to use utf8mb4_unicode_ci. For systems that have already used utf8mb4_general_ci, there is no need to spend time modifying them. Another point to note is that starting from MySQL 8.0, the default CHARSET of MySQL is no longer Latin1, but has been changed to utf8mb4 (reference link), and the default COLLATE has also been changed to utf8mb4_0900_ai_ci. utf8mb4_0900_ai_ci is basically a further subdivision of unicode, 0900 refers to the Unicode Collation Algorithm version, and ai means accent insensitive (pronunciation-insensitive), for example, e, è, é, ê and ë are treated the same. Related reference link 1, Related reference link 2 COLLATE setting levels and their precedenceCOLLATE settings can be specified at the instance level, library level, table level, column level, and SQL level. The instance-level COLLATE setting is the collation_connection system variable in the MySQL configuration file or startup command. The statement to set COLLATE at the library level is as follows: CREATE DATABASE <db_name> DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; The table-level COLLATE setting is to add relevant setting statements when creating a table, for example: CREATE TABLE ( … ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci; If CHARSET and COLLATE are not set at the table level, the table level will inherit the CHARSET and COLLATE at the database level. Column-level settings are specified when declaring the column in CREATE TABLE, for example CREATE TABLE ( `field1` VARCHAR(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '', … ) …… If CHARSET and COLATE are not set at the column level, the column level will inherit the CHARSET and COLLATE at the table level. Finally, you can also explicitly declare COLLATE when writing SQL queries to override the COLLATE setting of any database table column. This is not commonly used, but you can understand it: SELECT DISTINCT field1 COLLATE utf8mb4_general_ci FROM table1; SELECT field1, field2 FROM table1 ORDER BY field1 COLLATE utf8mb4_unicode_ci; If all settings are displayed, the priority order is SQL statement > column level settings > table level settings > library level settings > instance level settings. That is, the COLLATE specified on the column can override the COLLATE specified on the table, and the COLLATE specified on the table can override the COLLATE at the library level. If not specified, the setting is inherited from the next level. That is, if COLLATE is not specified on the column, the COLLATE of the column is the same as that set on the table. The above is the knowledge about COLLATE of MySQL. However, in system design, we should try to avoid making the system heavily dependent on the sorting results of Chinese fields, and we should also try to avoid using Chinese as query conditions in MySQL queries. SummarizeThis is the end of this article about COLLATE in MYSQL. For more information about what MYSQL COLLATE is, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! |
>>: HTML introductory tutorial HTML tag symbols quickly mastered
When the software package does not exist, it may ...
Table of contents 1. Environment 2. Preparation 3...
After solving the form auto-fill problem discussed...
wedge Because the MySQL version installed on the ...
1 Start the Docker service First you need to know...
This article shares the specific code of Vue to a...
Portainer is a lightweight docker environment man...
During the project optimization today, MySQL had ...
I logged into the backend to check the solution t...
Through JavaScript, we can prevent hyperlinks fro...
String extraction without delimiters Question Req...
1. Basic implementation of limit In general, the ...
<br />Maybe you've just come into a comp...
background Recently, some friends who are new to ...
In order to handle a large number of concurrent v...