In-depth analysis of why MySQL does not recommend using uuid or snowflake id as primary key

In-depth analysis of why MySQL does not recommend using uuid or snowflake id as primary key

Preface: When designing a table in MySQL, MySQL officially recommends not using UUID or non-continuous and non-repeating snowflake IDs (long and unique), but recommends continuous self-incrementing primary key IDs. The official recommendation is auto_increment. So why is UUID not recommended? What are the disadvantages of using UUID? In this blog, we will analyze this problem and explore the internal reasons.

1: mysql and program examples

1.1: To illustrate this problem, we first create three tables, namely user_auto_key, user_uuid, and user_random_key, which represent the automatically increasing primary key, uuid as the primary key, and random key as the primary key, respectively. We keep the others completely unchanged. According to the control variable method, we only generate the primary key of each table using different strategies, while the other fields are exactly the same, and then test the insertion speed and query speed of the table:

Note: The random key here actually refers to the ID calculated by the snowflake algorithm, which is not continuous, repeated, or irregular: a string of 18-bit long values

ID automatically generates table:

User UUID Table

Random primary key table:

1.2: Theory alone is not enough, let's go directly to the program and use spring's jdbcTemplate to implement the incremental test:

Technical framework: springboot+jdbcTemplate+junit+hutool. The principle of the program is to connect to your own test database, and then write the same amount of data in the same environment to analyze the insert time to synthesize its efficiency. In order to achieve the most realistic effect, all data is randomly generated, such as name, email address, and address. The program has been uploaded from gitee, and the address is at the bottom of the article.

package com.wyq.mysqldemo;
import cn.hutool.core.collection.CollectionUtil;
import com.wyq.mysqldemo.databaseobject.UserKeyAuto;
import com.wyq.mysqldemo.databaseobject.UserKeyRandom;
import com.wyq.mysqldemo.databaseobject.UserKeyUUID;
import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService;
import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService;
import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService;
import com.wyq.mysqldemo.util.JdbcTemplateService;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.StopWatch;
import java.util.List;
@SpringBootTest
class MysqlDemoApplicationTests {

  @Autowired
  private JdbcTemplateService jdbcTemplateService;

  @Autowired
  private AutoKeyTableService autoKeyTableService;

  @Autowired
  private UUIDKeyTableService uuidKeyTableService;

  @Autowired
  private RandomKeyTableService randomKeyTableService;


  @Test
  void testDBTime() {

    StopWatch stopwatch = new StopWatch("Execute sql time consumption");


    /**
     * auto_increment key task */
    final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";

    List<UserKeyAuto> insertData = autoKeyTableService.getInsertData();
    stopwatch.start("Automatically generate key table task starts");
    long start1 = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false);
      System.out.println(insertResult);
    }
    long end1 = System.currentTimeMillis();
    System.out.println("Time consumed by auto key: " + (end1 - start1));

    stopwatch.stop();


    /**
     * uudID key
     */
    final String insertSql2 = "INSERT INTO user_uuid(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?,?)";

    List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData();
    stopwatch.start("UUID key table task starts");
    long begin = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true);
      System.out.println(insertResult);
    }
    long over = System.currentTimeMillis();
    System.out.println("Time consumed by UUID key: " + (over - begin));

    stopwatch.stop();


    /**
     * Random long value key
     */
    final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?,?)";
    List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData();
    stopwatch.start("Random long value key table task starts");
    Long start = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true);
      System.out.println(insertResult);
    }
    Long end = System.currentTimeMillis();
    System.out.println("Random key task time consumed: " + (end - start));
    stopwatch.stop();


    String result = stopwatch.prettyPrint();
    System.out.println(result);
  }

1.3: Program write results

user_key_auto write result:

User_random_key write result:

The result of writing user_uuid table:

1.4: Efficiency test results

When the existing data volume is 1.3 million, let's test inserting 100,000 data to see what the result will be:

It can be seen that when the data volume is about 1 million, the insertion efficiency of uuid is the lowest, and when 1.3 million data are added in the subsequent sequence, the time of uuid drops sharply. The overall efficiency ranking based on the time usage is: auto_key>random_key>uuid. UUID has the lowest efficiency. When the amount of data is large, the efficiency drops sharply. So why does this phenomenon occur? With doubts, let's explore this issue:

2: Comparison of index structures using uuid and auto-increment id

2.1: Using the internal structure of the auto-increment id

The values ​​of the auto-increment primary key are sequential, so Innodb stores each record behind a record. When the maximum fill factor of the page is reached (the default maximum fill factor of InnoDB is 15/16 of the page size, and 1/16 of the space will be reserved for future modifications):

① The next record will be written to a new page. Once the data is loaded in this order, the primary key page will be filled with nearly sequential records, which increases the maximum fill rate of the page and avoids page waste.

②The newly inserted row will be located under the original largest row. MySQL locates and addresses the row quickly, and does not consume extra time to calculate the position of the new row.

③Reduce page splitting and fragmentation

2.2: Index internal structure using uuid

Because UUID is completely irregular compared to sequential auto-increment IDs, the value of a new row does not necessarily have to be greater than the value of the previous primary key, so InnoDB cannot always insert new rows at the end of the index. Instead, it needs to find a new suitable position for the new row to allocate new space. This process requires a lot of additional operations. The lack of order in the data will lead to scattered data distribution, which will cause the following problems:

①: The target page to be written may have been flushed to disk and removed from the cache, or has not yet been loaded into the cache. InnoDB has to find and read the target page from disk into memory before inserting, which will cause a large amount of random IO

②: Because writes are out of order, InnoDB has to frequently split pages to allocate space for new rows. Page splits result in moving a large amount of data, and at least three pages need to be modified for one insertion.

③: Due to frequent page splits, pages become sparse and irregularly filled, which eventually leads to data fragmentation

After loading the random values ​​(uuid and snowflake id) into the clustered index (the default index type of innodb), it is sometimes necessary to do an OPTIMEIZE TABLE to rebuild the table and optimize the page filling, which will take some time.

Conclusion: When using InnoDB, you should insert rows in the auto-increment order of the primary key as much as possible, and use monotonically increasing clustering key values ​​to insert new rows as much as possible.

2.3: Disadvantages of using auto-increment id

So is there no harm in using an auto-incrementing id? No, there are also the following problems with self-incrementing IDs:

①: Once someone crawls your database, they can obtain your business growth information based on the database's self-incrementing ID, and it is easy to analyze your business situation

②: For high concurrent loads, InnoDB will cause obvious lock contention when inserting by primary key. The upper bound of the primary key will become a hot spot for contention because all insertions occur here. Concurrent insertions will cause gap lock contention.

③: The Auto_Increment lock mechanism will cause the snatch of the auto-increment lock, resulting in a certain performance loss

Appendix: To improve the lock contention problem of Auto_increment, you need to tune the configuration of innodb_autoinc_lock_mode

Three: Summary

This blog starts with the question raised at the beginning, creating a table and using jdbcTemplate to test the performance of different ID generation strategies in inserting large amounts of data. It then analyzes the different ID mechanisms in the MySQL index structure and their advantages and disadvantages, and explains in depth why UUID and random non-repeating IDs have performance loss in data insertion, and explains this issue in detail. In actual development, it is best to use auto-increment ID according to the official recommendation of MySQL. MySQL is profound and there are still many points worth optimizing inside that we need to learn.

Appendix: This blog demo address: https://gitee.com/Yrion/mysqlIdDemo

This concludes this article on in-depth analysis of why MySQL does not recommend using UUID or Snowflake ID as primary key. For more relevant content about MySQL UUID or Snowflake ID as primary key, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief analysis of whether MySQL primary key uses numbers or uuids for faster query
  • MySQL method of generating random numbers, strings, dates, verification codes and UUIDs
  • The difference and advantages and disadvantages of Mysql primary key UUID and auto-increment primary key
  • How to modify server uuid in Mysql
  • How to remove horizontal lines when storing UUID in Mysql
  • Why does Mysql analyze and design table primary key not use uuid?

<<:  Linux uses bond to implement dual network cards to bind a single IP sample code

>>:  More elegant processing of dates in JavaScript based on Day.js

Recommend

Docker builds CMS on-demand system with player function

Table of contents text 1. Prepare the machine 2. ...

A little-known JS problem: [] == ![] is true, but {} == !{} is false

console.log( [] == ![] ) // true console.log( {} ...

Example analysis of mysql shared lock and exclusive lock usage

This article uses examples to illustrate the usag...

Front-end vue+express file upload and download example

Create a new server.js yarn init -y yarn add expr...

Hexadecimal color codes (full)

Red and pink, and their hexadecimal codes. #99003...

Specific steps for Vue browser to return monitoring

Preface When sharing a page, you hope to click th...

6 Uncommon HTML Tags

First: <abbr> or <acronym> These two s...

Summary of 9 excellent code comparison tools recommended under Linux

When we write code, we often need to know the dif...

Quickly get started with VUE 3 teleport components and usage syntax

Table of contents 1. Introduction to teleport 1.1...

MySQL green decompression version installation and configuration steps

Steps: 1. Install MySQL database 1. Download the ...

Use of Linux sed command

1. Function Introduction sed (Stream EDitor) is a...

Implementation of MySQL Multi-version Concurrency Control MVCC

Table of contents What is MVCC MVCC Implementatio...

A tutorial on how to install, use, and automatically compile TypeScript

1. Introduction to TypeScript The previous articl...

Detailed tutorial on installing mysql 8.0.20 on CentOS7.8

1. Install MySQL software Download and install My...

JavaScript to implement the function of changing avatar

This article shares the specific code of JavaScri...