In-depth analysis of why MySQL does not recommend using uuid or snowflake id as primary key

In-depth analysis of why MySQL does not recommend using uuid or snowflake id as primary key

Preface: When designing a table in MySQL, MySQL officially recommends not using UUID or non-continuous and non-repeating snowflake IDs (long and unique), but recommends continuous self-incrementing primary key IDs. The official recommendation is auto_increment. So why is UUID not recommended? What are the disadvantages of using UUID? In this blog, we will analyze this problem and explore the internal reasons.

1: mysql and program examples

1.1: To illustrate this problem, we first create three tables, namely user_auto_key, user_uuid, and user_random_key, which represent the automatically increasing primary key, uuid as the primary key, and random key as the primary key, respectively. We keep the others completely unchanged. According to the control variable method, we only generate the primary key of each table using different strategies, while the other fields are exactly the same, and then test the insertion speed and query speed of the table:

Note: The random key here actually refers to the ID calculated by the snowflake algorithm, which is not continuous, repeated, or irregular: a string of 18-bit long values

ID automatically generates table:

User UUID Table

Random primary key table:

1.2: Theory alone is not enough, let's go directly to the program and use spring's jdbcTemplate to implement the incremental test:

Technical framework: springboot+jdbcTemplate+junit+hutool. The principle of the program is to connect to your own test database, and then write the same amount of data in the same environment to analyze the insert time to synthesize its efficiency. In order to achieve the most realistic effect, all data is randomly generated, such as name, email address, and address. The program has been uploaded from gitee, and the address is at the bottom of the article.

package com.wyq.mysqldemo;
import cn.hutool.core.collection.CollectionUtil;
import com.wyq.mysqldemo.databaseobject.UserKeyAuto;
import com.wyq.mysqldemo.databaseobject.UserKeyRandom;
import com.wyq.mysqldemo.databaseobject.UserKeyUUID;
import com.wyq.mysqldemo.diffkeytest.AutoKeyTableService;
import com.wyq.mysqldemo.diffkeytest.RandomKeyTableService;
import com.wyq.mysqldemo.diffkeytest.UUIDKeyTableService;
import com.wyq.mysqldemo.util.JdbcTemplateService;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.util.StopWatch;
import java.util.List;
@SpringBootTest
class MysqlDemoApplicationTests {

  @Autowired
  private JdbcTemplateService jdbcTemplateService;

  @Autowired
  private AutoKeyTableService autoKeyTableService;

  @Autowired
  private UUIDKeyTableService uuidKeyTableService;

  @Autowired
  private RandomKeyTableService randomKeyTableService;


  @Test
  void testDBTime() {

    StopWatch stopwatch = new StopWatch("Execute sql time consumption");


    /**
     * auto_increment key task */
    final String insertSql = "INSERT INTO user_key_auto(user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?)";

    List<UserKeyAuto> insertData = autoKeyTableService.getInsertData();
    stopwatch.start("Automatically generate key table task starts");
    long start1 = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql, insertData, false);
      System.out.println(insertResult);
    }
    long end1 = System.currentTimeMillis();
    System.out.println("Time consumed by auto key: " + (end1 - start1));

    stopwatch.stop();


    /**
     * uudID key
     */
    final String insertSql2 = "INSERT INTO user_uuid(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?,?)";

    List<UserKeyUUID> insertData2 = uuidKeyTableService.getInsertData();
    stopwatch.start("UUID key table task starts");
    long begin = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql2, insertData2, true);
      System.out.println(insertResult);
    }
    long over = System.currentTimeMillis();
    System.out.println("Time consumed by UUID key: " + (over - begin));

    stopwatch.stop();


    /**
     * Random long value key
     */
    final String insertSql3 = "INSERT INTO user_random_key(id,user_id,user_name,sex,address,city,email,state) VALUES(?,?,?,?,?,?,?,?,?)";
    List<UserKeyRandom> insertData3 = randomKeyTableService.getInsertData();
    stopwatch.start("Random long value key table task starts");
    Long start = System.currentTimeMillis();
    if (CollectionUtil.isNotEmpty(insertData)) {
      boolean insertResult = jdbcTemplateService.insert(insertSql3, insertData3, true);
      System.out.println(insertResult);
    }
    Long end = System.currentTimeMillis();
    System.out.println("Random key task time consumed: " + (end - start));
    stopwatch.stop();


    String result = stopwatch.prettyPrint();
    System.out.println(result);
  }

1.3: Program write results

user_key_auto write result:

User_random_key write result:

The result of writing user_uuid table:

1.4: Efficiency test results

When the existing data volume is 1.3 million, let's test inserting 100,000 data to see what the result will be:

It can be seen that when the data volume is about 1 million, the insertion efficiency of uuid is the lowest, and when 1.3 million data are added in the subsequent sequence, the time of uuid drops sharply. The overall efficiency ranking based on the time usage is: auto_key>random_key>uuid. UUID has the lowest efficiency. When the amount of data is large, the efficiency drops sharply. So why does this phenomenon occur? With doubts, let's explore this issue:

2: Comparison of index structures using uuid and auto-increment id

2.1: Using the internal structure of the auto-increment id

The values ​​of the auto-increment primary key are sequential, so Innodb stores each record behind a record. When the maximum fill factor of the page is reached (the default maximum fill factor of InnoDB is 15/16 of the page size, and 1/16 of the space will be reserved for future modifications):

① The next record will be written to a new page. Once the data is loaded in this order, the primary key page will be filled with nearly sequential records, which increases the maximum fill rate of the page and avoids page waste.

②The newly inserted row will be located under the original largest row. MySQL locates and addresses the row quickly, and does not consume extra time to calculate the position of the new row.

③Reduce page splitting and fragmentation

2.2: Index internal structure using uuid

Because UUID is completely irregular compared to sequential auto-increment IDs, the value of a new row does not necessarily have to be greater than the value of the previous primary key, so InnoDB cannot always insert new rows at the end of the index. Instead, it needs to find a new suitable position for the new row to allocate new space. This process requires a lot of additional operations. The lack of order in the data will lead to scattered data distribution, which will cause the following problems:

①: The target page to be written may have been flushed to disk and removed from the cache, or has not yet been loaded into the cache. InnoDB has to find and read the target page from disk into memory before inserting, which will cause a large amount of random IO

②: Because writes are out of order, InnoDB has to frequently split pages to allocate space for new rows. Page splits result in moving a large amount of data, and at least three pages need to be modified for one insertion.

③: Due to frequent page splits, pages become sparse and irregularly filled, which eventually leads to data fragmentation

After loading the random values ​​(uuid and snowflake id) into the clustered index (the default index type of innodb), it is sometimes necessary to do an OPTIMEIZE TABLE to rebuild the table and optimize the page filling, which will take some time.

Conclusion: When using InnoDB, you should insert rows in the auto-increment order of the primary key as much as possible, and use monotonically increasing clustering key values ​​to insert new rows as much as possible.

2.3: Disadvantages of using auto-increment id

So is there no harm in using an auto-incrementing id? No, there are also the following problems with self-incrementing IDs:

①: Once someone crawls your database, they can obtain your business growth information based on the database's self-incrementing ID, and it is easy to analyze your business situation

②: For high concurrent loads, InnoDB will cause obvious lock contention when inserting by primary key. The upper bound of the primary key will become a hot spot for contention because all insertions occur here. Concurrent insertions will cause gap lock contention.

③: The Auto_Increment lock mechanism will cause the snatch of the auto-increment lock, resulting in a certain performance loss

Appendix: To improve the lock contention problem of Auto_increment, you need to tune the configuration of innodb_autoinc_lock_mode

Three: Summary

This blog starts with the question raised at the beginning, creating a table and using jdbcTemplate to test the performance of different ID generation strategies in inserting large amounts of data. It then analyzes the different ID mechanisms in the MySQL index structure and their advantages and disadvantages, and explains in depth why UUID and random non-repeating IDs have performance loss in data insertion, and explains this issue in detail. In actual development, it is best to use auto-increment ID according to the official recommendation of MySQL. MySQL is profound and there are still many points worth optimizing inside that we need to learn.

Appendix: This blog demo address: https://gitee.com/Yrion/mysqlIdDemo

This concludes this article on in-depth analysis of why MySQL does not recommend using UUID or Snowflake ID as primary key. For more relevant content about MySQL UUID or Snowflake ID as primary key, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief analysis of whether MySQL primary key uses numbers or uuids for faster query
  • MySQL method of generating random numbers, strings, dates, verification codes and UUIDs
  • The difference and advantages and disadvantages of Mysql primary key UUID and auto-increment primary key
  • How to modify server uuid in Mysql
  • How to remove horizontal lines when storing UUID in Mysql
  • Why does Mysql analyze and design table primary key not use uuid?

<<:  Linux uses bond to implement dual network cards to bind a single IP sample code

>>:  More elegant processing of dates in JavaScript based on Day.js

Recommend

Modularity in Node.js, npm package manager explained

Table of contents The basic concept of modularity...

Summary of a CSS code that makes the entire site gray

In order to express the deep condolences of peopl...

Implementation of setting fixed IP when starting docker container

Network type after docker installation [root@insu...

Example code for CSS columns to achieve two-end alignment layout

1. Going around in circles After going around in ...

Vue implements login verification code

This article example shares the specific code of ...

Sample code for implementing PC resolution adaptation in Vue

Table of contents plan Install Dependencies Intro...

How to compile and install PHP and Nginx in Ubuntu environment

This article describes how to compile and install...

How to set up remote access to a server by specifying an IP address in Windows

We have many servers that are often interfered wi...

Reduce memory and CPU usage by optimizing web pages

Some web pages may not look large but may be very ...

What is JavaScript anti-shake and throttling

Table of contents 1. Function debounce 1. What is...

Example code for using Nginx to implement 301 redirect to https root domain name

Based on SEO and security considerations, a 301 r...

The best way to solve the 1px border on mobile devices (recommended)

When developing for mobile devices, you often enc...

How to reference jQuery in a web page

It can be referenced through CDN (Content Delivery...