A simple ID generation strategy: Implementation of generating globally unique ID from MySQL table

A simple ID generation strategy: Implementation of generating globally unique ID from MySQL table

There are many ways to generate a global ID. Here is a simple solution: using MySQL's auto-increment ID to generate a globally unique ID.

1. Create a table that only needs two fields:

CREATE TABLE `guid` (
 `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
 `stub` char(1) NOT NULL DEFAULT '' COMMENT 'stub field, used to occupy the pit',
 PRIMARY KEY (`id`),
 UNIQUE KEY `uk_stub` (`stub`) -- set stub as a unique index) ENGINE=MyISAM AUTO_INCREMENT=1000000000 DEFAULT CHARSET=utf8;

Specify the auto-increment start value: alter table guid auto_increment=1000000000, which ensures that the ID is 10 digits (it is almost impossible to increase it to 11 digits).

2. Define mybatis mapper:

@Mapper
public interface GuidMapper {


 //Get a globally unique ID
  * @return
  */
 // replace into afs_guid(stub) values('a');
 // select last_insert_id();
 @Insert("REPLACE INTO guid (stub) VALUES('a')")
 @SelectKey(statement = {"SELECT LAST_INSERT_ID()"}, keyProperty = "guidHolder.id", before = false, resultType = long.class)
 int getGuid( @Param("guidHolder") GuidHolder guidHolder);

 @Data
 public static class GuidHolder{
  private long id;
  private String stub;
 }

3. Testing

 GuidMapper.GuidHolder guidHolder = new GuidMapper.GuidHolder();
 int i = guidMapper.getGuid(guidHolder);
 long guid = guidHolder.getId();
 // guid is the returned ID

Tail

Concurrency safety issues

REPLACE INTO is similar to INSERT and is safe. It will first check whether the primary key or unique key is repeated. If it is repeated, it will delete the original one and add a new one to replace the original one.

SELECT LAST_INSERT_ID() is bound to the MySQL connection. On the current connection, the operation triggers the change of the auto_increment value, and the new value is obtained. This value is only visible to the current connection. Other connections will only get the value after it changes the auto_increment.

The above two points ensure concurrency safety.

In addition, even if you manually reduce the value of id, it will continue to increase based on the last auto-increment after the next replace into. This is because manually modifying the value of id will not change the value of auto_increment.

Supplementary knowledge: How to ensure the generation of distributed unique global IDs in high concurrency clusters

Preface

The system's unique ID is a problem we often encounter when designing a system, and we often struggle with this problem.

This article is to provide you with an idea for generating a distributed unique global ID generation solution, I hope it can help everyone.

Please feel free to point out any shortcomings! !

question

Why do we need a distributed globally unique ID and what are the business requirements for distributed IDs?

In complex distributed systems, it is often necessary to uniquely identify a large amount of data and messages, such as in Meituan Dianping’s finance, payment, catering, and hotel

The data in the system of products such as Maoyan Movies is gradually increasing. After the database is divided into different tables, a unique ID is needed to identify a piece of data or information.

In particular, Ian's orders, riders, and coupons all need to be identified by a unique ID.

At this time, a system that can generate a globally unique ID is very necessary

Some rigid requirements for ID generation rules

Globally unique

Trend increasing

The clustered index is used in the MySQL InnoDB engine. Since most RDBMS use the Btree data structure to store indexes, we should try to use ordered primary keys to ensure write performance when choosing primary keys.

Monotonically increasing

Ensure that the next ID is greater than the previous ID, such as transaction version number, IM incremental message, sorting and other special requirements

Information Security

If the ID is continuous, malicious users can easily crawl the URL in sequence. If it is an order number, it is dangerous because competitors can directly know our daily order volume. Therefore, in some application scenarios, the ID needs to be irregular to make it difficult for competitors to guess.

With timestamp

You can also quickly understand when this distributed ID is generated during development

Availability requirements for ID number generation systems

High Availability

When I issue a request to obtain a distributed ID, the server will guarantee that it will create a unique distributed ID for me in 99.999% of the cases.

Low latency

Send a request to get a distributed ID, the server must be fast, extremely fast

High QPS

For example, if 100,000 distributed ID creation requests come at the same time, the server must be able to handle it and successfully create 100,000 distributed IDs at once.

General universal solution

UUID

UUID.randomUUID() , the standard form of UUID contains 32 hexadecimal digits, divided into five segments by hyphens, 36 characters in the form of 8-4-4-4-12, with very high performance, locally generated, and no network consumption.

Problems

Poor database access performance because UUIDs are unordered

Unordered, its generation order cannot be predicted, and it cannot generate increasing and ordered numbers

First of all, distributed IDs are generally used gradually, but according to the official recommendation of MySQL, the primary key should be as short as possible. Each UUID is very long, so it is not recommended.

Primary key: When ID is used as the primary key, there will be some problems in certain environments.

For example, UUID is not suitable for use as a DB primary key. MySQL has a clear explanation.

Index, B+ tree index split

Since the distributed ID is the primary key, and the primary key contains an index, and the MySQL index is implemented through a B+ tree, each time new UUID data is inserted, the underlying B+ tree of the index will be modified for query optimization. Because UUID data is unordered, each insertion of UUID data will cause a major modification to the primary key's B+ tree, which is very bad. The insertion is completely unordered, which will not only cause some intermediate nodes to split, but also create many unsaturated nodes in vain, greatly reducing the performance of database insertion.

UUID can only guarantee global uniqueness, but does not satisfy the trend of increasing or monotonically increasing.

Database auto-increment primary key

Single Machine

In a distributed system, the main principle of the database's auto-increment ID mechanism is implemented by the database's auto-increment ID and the replace into of the MySQL database. The replace into here is similar to the insert function, with the difference that replace into first tries to insert into the data list. If it is found that this row of data already exists in the table (based on the primary key or unique index), it will be deleted first and then inserted. Otherwise, the new data will be directly inserted.

REPLACE INTO means inserting a record. If the value of the unique index in the table conflicts, the old data is replaced.

REPLACE into t_test(stub) values('b');

select LAST_INSERT_ID();

Every time we insert, we find that the original data will be replaced and the ID will increase

This is satisfying

Incremental

Monotonicity

Uniqueness

In a distributed situation with low concurrency, this solution can be used to obtain a globally unique ID.

Cluster Distributed Cluster

Is the database's self-increment ID mechanism suitable for distributed IDs? The answer is not suitable

It is difficult to expand the system horizontally. For example, after defining the step size and the number of machines, what should we do if we need to add a machine? Suppose there is a machine numbered: 1, 2, 3, 4, 5 (the step size is 1), and we need to expand the capacity of one machine. We can do this: set the initial value of the second machine to be much higher than the first one. It seems okay, but if there are 100 machines online, how to expand the capacity at this time is a nightmare. Therefore, the system horizontal expansion plan is complex and difficult to implement.

The database pressure is still very high. Every time an ID is obtained, the database must be read and written, which greatly affects performance and does not meet the low latency and high QPS rules in distributed IDs (under high concurrency, if the ID is obtained from the database, it will greatly affect performance)

Generate global ID strategy based on Redis

Standalone version

Because Redis is single-threaded, atomicity is inherently guaranteed and can be achieved using the atomic operations INCR and INCRBY

INCRBY: Set the growth step size

Cluster distribution

Note: In the case of Redis cluster, different growth steps need to be set as in MySQL, and the key must be set with a validity period. You can use Redis cluster to obtain higher throughput.

Assume that there are 5 Redis servers in a cluster. You can initialize the value of each Redis server to 1, 2, 3, 4, and 5 respectively, and then set the step size to 5.

The IDs generated by each Redis are:

A: 1 6 11 16 21

B: 2 7 12 17 22

C: 3 8 13 18 23

D: 4 9 14 19 24

E: 5 10 15 20 25

However, the problem is that the maintenance and configuration of the Redis cluster are rather troublesome. Because a single point of failure is required, the sentinel

But the main problem is that for an ID, you need to introduce the entire Redis cluster, which feels like overkill.

Snowflake Algorithm

What is

Twitter's distributed auto-incrementing ID algorithm, Snowflake

Initially, Twitter migrated its storage system from MySQL to Cassandra (an open source distributed NoSQL database system developed by Facebook). Because Cassandra did not have a sequential ID generation mechanism, it developed such a globally unique ID generation service.

Twitter's distributed snowflake algorithm SnowFlake can generate 260,000 auto-incrementing sortable IDs per second.

Twitter's SnowFlake generates IDs in a time-ordered manner

The result of the ID generated by the SnowFlake algorithm is a 64-bit integer, which is a Long type (the maximum length after conversion to a string is 19).

There will be no ID collision in the distributed system (differentiated by datacenter and workerID) and the efficiency is high.

In a distributed system, there are some scenarios that require a globally unique ID. The basic requirements for generating an ID are:

In a distributed environment, global uniqueness is required

Generally, it needs to be monotonically increasing, because generally unique IDs will exist in the database, and the characteristic of InnoDB is to store the content in the leaf nodes on the primary key index, and it increases from left to right. Therefore, considering the database performance, it is generally best to generate monotonically increasing IDs. To prevent ID conflicts, 36-bit UUIDs can be used, but UUIDs have some disadvantages. First, they are relatively long, and second, UUIDs are generally unordered.

You may also need no rules, because if you use a unique ID as the order number, you need this rule to prevent others from knowing the number of orders per day.

structure

Several core components of the snowflake algorithm

In Java, 64-bit certificates are of type long, so the ID generated by the SnowFlake algorithm is stored in the long class.

Part I

The highest bit in binary is the sign bit, 1 represents a negative number and 0 represents a positive number. The generated ID is usually an integer, so the highest bit is fixed to 0.

Part 2

The second part is the 41-bit timestamp bit, which is used to record the timestamp in milliseconds.

41 bits can represent 2^41 -1 numbers

If it is only used to represent positive integers, the range that can be represented is: 0 - 2^41 -1. The reason for minus 1 is that the range of values ​​that can be represented starts from 0, not 1.

That is to say, 41 bits can represent the value of 2^41 - 1 milliseconds, which is 69.73 years when converted into units of years.

Part 3

The third part is the work machine ID, 10 bits are used to record the work machine ID

Can be deployed on 2^10 = 1024 nodes, including 5-digit datacenterId (data center, computer room) and 5-digit workerID (machine code)

The maximum positive integer that can be represented by 5 bits is 2^5 = 31 numbers, which can represent different data centers and machine codes.

Part 4

The positive integer that 12 bits can be used to represent is 2^12 = 4095, that is, 0 1 2 … 4094 can be used to represent 4095 ID numbers generated by the same machine and the same timestamp.

SnowFlake can guarantee

All generated IDs increase in time trend

There will be no duplicate IDs in the entire distributed system because datacenterId and workerId are used to distinguish them.

accomplish

The snowflake algorithm is written in Scala, and some people use Java to implement it. The github address is

/**
 * Twitter's Snowflake algorithm -- Java implementation * 
 * @author beyond
 * @date 2016/11/26
 */
public class SnowFlake {

 /**
  * Starting timestamp */
 private final static long START_STMP = 1480166465631L;

 /**
  * The number of bits each part occupies*/
 private final static long SEQUENCE_BIT = 12; //Number of digits occupied by the serial number private final static long MACHINE_BIT = 5; //Number of digits occupied by the machine identifier private final static long DATACENTER_BIT = 5;//Number of digits occupied by the data center/**
  * Maximum value of each part*/
 private final static long MAX_DATACENTER_NUM = -1L ^ (-1L << DATACENTER_BIT);
 private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
 private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);

 /**
  * The displacement of each part to the left*/
 private final static long MACHINE_LEFT = SEQUENCE_BIT;
 private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
 private final static long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT;

 private long datacenterId; //Data centerprivate long machineId; //Machine IDprivate long sequence = 0L; //Serial numberprivate long lastStmp = -1L; //Last timestamppublic SnowFlake(long datacenterId, long machineId) {
  if (datacenterId > MAX_DATACENTER_NUM || datacenterId < 0) {
   throw new IllegalArgumentException("datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0");
  }
  if (machineId > MAX_MACHINE_NUM || machineId < 0) {
   throw new IllegalArgumentException("machineId can't be greater than MAX_MACHINE_NUM or less than 0");
  }
  this.datacenterId = datacenterId;
  this.machineId = machineId;
 }

 /**
  * Generate the next ID
  *
  * @return
  */
 public synchronized long nextId() {
  long currStmp = getNewstmp();
  if (currStmp < lastStmp) {
   throw new RuntimeException("Clock moved backwards. Refusing to generate id");
  }

  if (currStmp == lastStmp) {
   //In the same millisecond, the sequence number increases automatically sequence = (sequence + 1) & MAX_SEQUENCE;
   //The number of sequences in the same millisecond has reached the maximum if (sequence == 0L) {
    currStmp = getNextMill();
   }
  } else {
   //In different milliseconds, the sequence number is set to 0
   sequence = 0L;
  }

  lastStmp = currStmp;

  return (currStmp - START_STMP) << TIMESTMP_LEFT //Timestamp part | datacenterId << DATACENTER_LEFT //Data center part | machineId << MACHINE_LEFT //Machine ID part | sequence; //Serial number part}

 private long getNextMill() {
  long mill = getNewstmp();
  while (mill <= lastStmp) {
   mill = getNewstmp();
  }
  return mill;
 }

 private long getNewstmp() {
  return System.currentTimeMillis();
 }

 public static void main(String[] args) {
  SnowFlake snowFlake = new SnowFlake(2, 3);

  for (int i = 0; i < (1 << 12); i++) {
   System.out.println(snowFlake.nextId());
  }

 }
}

Project implementation experience

hutools toolkit

Address: https://github.com/looly/hutool

SpringBoot integrates snowflake algorithm

Introduce the hutool tool class

<dependency>
 <groupId>cn.hutool</groupId>
 <artifactId>hutool-all</artifactId>
 <version>5.3.1</version>
</dependency>

Integration

/**
 * Snowflake Algorithm*
 * @author: Moxi* @create: 2020-04-18-11:08
 */
public class SnowFlakeDemo {
 private long workerId = 0;
 private long datacenterId = 1;
 private Snowflake snowFlake = IdUtil.createSnowflake(workerId, datacenterId);

 @PostConstruct
 public void init() {
  try {
   //Convert the network ip into long
   workerId = NetUtil.ipv4ToLong(NetUtil.getLocalhostStr());
  } catch (Exception e) {
   e.printStackTrace();
  }
 }

 /**
  * Get the snowflake ID
  * @return
  */
 public synchronized long snowflakeId() {
  return this.snowFlake.nextId();
 }

 public synchronized long snowflakeId(long workerId, long datacenterId) {
  Snowflake snowflake = IdUtil.createSnowflake(workerId, datacenterId);
  return snowflake.nextId();
 }

 public static void main(String[] args) {
  SnowFlakeDemo snowFlakeDemo = new SnowFlakeDemo();
  for (int i = 0; i < 20; i++) {
   new Thread(() -> {
    System.out.println(snowFlakeDemo.snowflakeId());
   }, String.valueOf(i)).start();
  }
 }
}

Get the result

1251350711346790400
1251350711346790402
1251350711346790401
1251350711346790403
1251350711346790405
1251350711346790404
1251350711346790406
1251350711346790407
1251350711350984704
1251350711350984706
1251350711350984705
1251350711350984707
1251350711350984708
1251350711350984709
1251350711350984710
1251350711350984711
1251350711350984712
1251350711355179008
1251350711355179009
1251350711355179010

Pros and Cons

advantage

The milliseconds are in the high dimension, the auto-increment sequence is in the low dimension, and the entire ID is increasing in trend.

It does not rely on third-party systems such as databases, and is deployed as a service, which has higher stability and very high performance in generating IDs.

Bits can be allocated according to their own business characteristics, which is very flexible

shortcoming

Depends on the machine clock. If the machine clock is dialed back, duplicate IDs will be generated.

It is increasing on a single machine, but because it involves a distributed environment, the clocks on each machine cannot be completely synchronized, and sometimes there will be situations where the global increase is not present. This shortcoming can be considered unimportant. Generally, distributed IDs only require an increasing trend, and do not strictly require an increasing trend. 90% of the requirements only require an increasing trend.

Other Supplements

In order to solve the problem of clock dialing back, which leads to duplicate IDs, someone later proposed a solution.

Baidu's open source distributed unique ID generator UidGenerator

Leaf - Meituan Dianping Distributed ID Generation System

The above article, a simple ID generation strategy: Implementation of generating a globally unique ID from a Mysql table, is all I have to share with you. I hope it can give you a reference and I also hope that you will support 123WORDPRESS.COM.

You may also be interested in:
  • Java generates distributed id based on snowflake algorithm
  • Implementing distributed order numbers and distributed IDs based on Redis (custom rule generation)
  • Implementation of Redis generating globally unique ID for distributed systems
  • Detailed explanation of how to use MongoDB+Springboot to implement distributed ID
  • Java implements Twitter's distributed auto-increment ID algorithm Snowflake
  • Leaf Solution Implements Meituan Dianping's Distributed ID Generation System

<<:  JavaScript to show and hide the drop-down menu

>>:  Detailed tutorial on configuring nginx for https encrypted access

Recommend

Introduction to new features of ECMAscript

Table of contents 1. Default values ​​for functio...

Summary of the data storage structure of the nginx http module

Starting from this section, we will explain the i...

JavaScript to achieve uniform animation effect

This article example shares the specific code for...

Parameters to make iframe transparent

<iframe src="./ads_top_tian.html" all...

MySQL SQL Optimization Tutorial: IN and RANGE Queries

First, let's talk about the in() query. It is...

Detailed analysis of replication in Mysql

1.MySQL replication concept It means transferring...

Summary and examples of vue3 component communication methods

The communication modes of vue3 components are as...

Echarts implements switching different X-axes in one graph (example code)

Rendering If you want to achieve the effect shown...

How to quickly build a LAMP environment on CentOS platform

This article uses an example to describe how to q...

A brief discussion on React Component life cycle functions

What are the lifecycle functions of React compone...

Vue implements form data validation example code

Add rules to the el-form form: Define rules in da...

Solve the pitfall of storing boolean type values ​​in localstorage

LocalStorage stores Boolean values Today, when I ...

Two ways to declare private variables in JavaScript

Preface JavaScript is not like other languages ​​...

Html Select uses the selected attribute to set the default selection

Adding the attribute selected = "selected&quo...

VMware virtual machine to establish HTTP service steps analysis

1. Use xshell to connect to the virtual machine, ...