When choosing the data type of id, you need to consider not only the data storage type, but also how MySQL calculates and compares this type. For example, MySQL stores ENUM and SET types as integers internally, but compares them as strings in string scenarios. Once the data type of the id is selected, you need to ensure that the data types of the related data tables that reference the id are consistent and completely consistent, including properties such as length and whether it is signed! Mixing different data types can cause performance problems, and even if there are no performance problems, implicit data conversions during comparisons can cause subtle errors. However, if you forget about the different data types during the actual development process, unexpected problems may suddenly arise. When choosing the length, you also need to choose a small field length as much as possible and leave some room for future growth. For example, if it is used to store provinces, we only have dozens of values. At this time, it is better to use TINYINT rather than INT. If the related table also stores this id, the efficiency difference will be very large. Here are some typical types that apply to ids:
If you do want to use a UUID value, you should remove the "-" character, or use the UNHEX function to convert it to a 16-byte number and store it using BINARY(16). Then you can use the HEX function to get it in hexadecimal format. There are many ways to generate UUID, some are randomly distributed, and some are ordered, but even the ordered ones do not perform as well as integers. Summary of Distributed ID SolutionsID is the unique identifier of data. The traditional approach is to use UUID and the database's auto-increment ID. Today, MySQL is used more and more widely, and because it requires transaction support, the Innodb storage engine is usually used. UUID is too long and disordered, so it is not suitable as the primary key in Innodb. Auto-increment ID is more appropriate, but with business development, the amount of data will become larger and larger, and the data needs to be divided into tables. After the tables are divided, the data in each table will be auto-incremented at its own pace, and there is a high probability of ID conflicts. At this time, a separate mechanism is needed to generate a unique ID. The generated ID can also be called a distributed ID or a global ID. Let's analyze the mechanisms for generating distributed IDs. Database auto-increment IDThis method is based on the database's auto-increment ID. You need to use a separate database instance and create a separate table in this instance: The table structure is as follows: CREATE DATABASE `SEQID`; CREATE TABLE SEQID.SEQUENCE_ID ( id bigint(20) unsigned NOT NULL auto_increment, stub char(10) NOT NULL default '', PRIMARY KEY (id), UNIQUE KEY stub (stub) )ENGINE=MyISAM; You can use the following statement to generate and obtain an auto-increment ID begin; replace into SEQUENCE_ID (stub) VALUES ('anyword'); select last_insert_id(); commit; The stub field does not have any special meaning here. It is just for the convenience of inserting data. Only when data can be inserted can an auto-increment id be generated. For insertion, we use replace. Replace will first check whether there is data with the same value as the stub specified. If it exists, it will delete it first and then insert it. If it does not exist, it will directly insert it. This mechanism for generating distributed IDs requires a separate MySQL instance. Although it is feasible, it is not enough in terms of performance and reliability. Every time a business system needs an ID, it needs to request the database to obtain it, which has low performance. If this database instance goes offline, all business systems will be affected. ; Therefore, the data in this way has a certain degree of unreliability. Database multi-master modeIf our two databases form a master-slave cluster, the database reliability problem can be solved under normal circumstances. However, if the master database fails and the data is not synchronized to the slave database in time, ID duplication will occur. This is why we can use a multi-master mode☞dual-master mode cluster, that is, both MySQL instances can generate auto-increment IDs independently, which can improve efficiency. However, if no other modifications are made, the two MySQL instances are likely to generate the same ID. You need to configure different starting values and auto-increment steps for each MySQL instance. First MySQL instance configuration (mysql_01): set @@auto_increment_offset = 1; -- starting value set @@auto_increment_increment = 2; -- step length Second MySQL instance configuration (mysql_02): set @@auto_increment_offset = 2; -- starting value set @@auto_increment_increment = 2; -- step length After the above configuration, the id sequences generated by the two MySQL instances are as follows: For this solution of generating distributed IDs, you need to add a new distributed ID generation application, such as DistributIdService. This application provides an interface for business applications to obtain IDs. When a business application needs an ID, it requests DistributIdService through RPC. DistributIdService randomly obtains the ID from the two MySQL instances above. After implementing this solution, even if one of the MySQL instances goes offline, it will not affect DistributIdService, and DistributIdService can still use another MySQL to generate IDs. However, this solution does not scale well. If two MySQL instances are not enough and you need to add a new MySQL instance to improve performance, it will be troublesome. Now if you want to add a new instance mysql_03, how do you do it?
Number segment modeThis mode can be understood as batch acquisition. For example, when DistributIdService obtains IDs from the database, if multiple IDs can be obtained in batches and cached locally, it will greatly improve the efficiency of business applications in obtaining IDs. For example, each time DistributIdService obtains an ID from the database, it obtains a number segment, such as (1,1000]. This range represents 1000 IDs. When the business application requests DistributIdService to provide an ID, DistributIdService only needs to increment the number locally from 1 and return it, without requesting the database every time. It does not go to the database to obtain the next number segment until the local number is incremented to 1000, that is, when the current number segment has been used up. Therefore, we need to modify the database table as follows: CREATE TABLE id_generator ( id int(10) NOT NULL, current_max_id bigint(20) NOT NULL COMMENT 'Current maximum id', increment_step int(10) NOT NULL COMMENT 'Increment step', PRIMARY KEY (`id`) )ENGINE=InnoDB DEFAULT CHARSET=utf8; This database table is used to record the auto-increment step and the current maximum value of the auto-increment ID (that is, the last value of the currently applied number segment). Because the auto-increment logic has been moved to DistributIdService, the database no longer needs this logic. This solution is no longer strongly dependent on the database. Even if the database is unavailable, DistributIdService can continue to support it for a period of time. However, if the DistributIdService is restarted, a segment of IDs will be lost, resulting in ID holes. In order to improve the high availability of DistributIdService, a cluster is required. When a business requests the DistributIdService cluster to obtain an ID, it will randomly select a DistributIdService node to obtain it. For each DistributIdService node, the database is connected to the same database, so multiple DistributIdService nodes may request the database to obtain the number segment at the same time. In this case, optimistic locking is required for control. For example, a version field is added to the database table. When obtaining the number segment, the following SQL is used: update id_generator set current_max_id=#{newMaxId}, version=version+1 where version = #{version} Because newMaxId is calculated in DistributIdService based on oldMaxId + step size, as long as the above update is successful, it means that the number segment is obtained successfully. In order to provide high availability of the database layer, the database needs to be deployed in multi-master mode. For each database, it is necessary to ensure that the generated number segments are not repeated. This requires using the original idea and adding the starting value and step size to the database table. For example, if there are two MySQL servers, then: For specific implementation code, please refer to: tinyid Snowflake Algorithm The three modes of database self-increment ID mode, database multi-master mode, and number segment mode are all based on the idea of self-increment; the idea of the snowflake algorithm can be briefly understood below. The core idea is that the distributed ID is a fixed long number. A long number occupies 8 bytes, that is, 64 bits. The allocation of bits in the original Snowflake algorithm is as follows:
According to the logic of this algorithm, we only need to implement this algorithm in Java language and encapsulate it into a tool method. Then each business application can directly use this tool method to obtain the distributed ID. We only need to ensure that each business application has its own working machine ID, without having to build a separate application to obtain the distributed ID. It also does not rely on a database. Specific code implementation package com.yeming.tinyid.application; import static java.lang.System.*; /** * @author yeming.gao * @Description: Snowflake algorithm implementation* <p> * The SnowFlake algorithm is used to generate 64-bit IDs, which can be stored in long integers and can be used to generate unique IDs in distributed systems. * And the generated IDs have a rough order. In this implementation, the generated 64-bit ID can be divided into 5 parts: * 0 - 41-bit timestamp - 5-bit data center ID - 5-bit machine ID - 12-bit serial number * @date 2020/07/28 16:15 */ public class SnowFlake { /** * Starting timestamp */ private static final long START_STMP = 1480166465631L; /** * Number of digits occupied by the machine ID */ private static final long MACHINE_BIT = 5; /** * Number of bits occupied by the data center */ private static final long DATACENTER_BIT = 5; /** * Number of digits occupied by the serial number */ private static final long SEQUENCE_BIT = 12; /** * Maximum value of machine ID*/ private static final long MAX_MACHINE_NUM = ~(-1L << MACHINE_BIT); /** * Data center maximum value */ private static final long MAX_DATACENTER_NUM = ~(-1L << DATACENTER_BIT); /** * Maximum serial number value*/ private static final long MAX_SEQUENCE = ~(-1L << SEQUENCE_BIT); /** * The displacement of each part to the left*/ private static final long MACHINE_LEFT = SEQUENCE_BIT; private static final long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT; private static final long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT; private long datacenterId; //Data centerprivate long machineId; //Machine IDprivate long sequence = 0L; //Serial numberprivate long lastStmp = -1L; //Last timestampprivate SnowFlake(long datacenterId, long machineId) { if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) { throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0"); } if (machineId > MAX_MACHINE_NUM || machineId < 0) { throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0"); } this.datacenterId = datacenterId; this.machineId = machineId; } /** * Generate the next ID * * @return long */ private synchronized long nextId() { long currStmp = System.currentTimeMillis(); if (currStmp < lastStmp) { throw new RuntimeException("Clock moved backwards. Refusing to generate id"); } if (currStmp == lastStmp) { //In the same millisecond, the sequence number increases automatically sequence = (sequence + 1) & MAX_SEQUENCE; //The number of sequences in the same millisecond has reached the maximum if (sequence == 0L) { currStmp = getNextMill(); } } else { //In different milliseconds, the sequence number is set to 0 sequence = 0L; } lastStmp = currStmp; return (currStmp - START_STMP) << TIMESTMP_LEFT //Timestamp part | datacenterId << DATACENTER_LEFT //Data center part | machineId << MACHINE_LEFT //Machine ID part | sequence; //Serial number part} private long getNextMill() { long mill = System.currentTimeMillis(); while (mill <= lastStmp) { mill = System.currentTimeMillis(); } return mill; } public static void main(String[] args) { SnowFlake snowFlake = new SnowFlake(2, 3); // Maximum value of data center identifier long maxDatacenterNum = ~(-1L << DATACENTER_BIT); //Maximum value of machine identifier long maxMachineNum = ~(-1L << MACHINE_BIT); // Maximum value of sequence number long maxSequence = ~(-1L << SEQUENCE_BIT); out.println("Maximum value of data center ID: " + maxDatacenterNum + "; Maximum value of machine ID: " + maxMachineNum + "; Maximum value of sequence number: " + maxSequence); for (int i = 0; i < (1 << 12); i++) { out.println(snowFlake.nextId()); } } } The snowflake algorithm can refer to:
The above is the details of how to choose the appropriate data type for MySQL id. For more information about choosing the appropriate data type for MySQL id, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: 3 different ways to clear the option options in the select tag
need Recently, we need to migrate Node online ser...
The overall architecture of NGINX is characterize...
Docker is becoming more and more mature and its f...
Today I made a Spring Festival gold coin red enve...
When using Nginx as a Web server, I encountered t...
Download from official website: https://www.mysql...
Use the following command to create a container a...
Detailed explanation of the usage of DECIMAL in M...
The root account of mysql, I usually use localhos...
General CSS code will only cause minor issues wit...
1. Background The company's projects have alw...
Preface I have always wanted to know how a SQL st...
Table of contents 1. Create a redis docker base i...
The format of textarea can be saved to the databas...
IE gave us a headache in the early stages of deve...