A MySQL migration plan and practical record of pitfalls

A MySQL migration plan and practical record of pitfalls

background

Since historical business data is stored in MySQL, there is an operation record table video_log. Whenever a user creates, updates, or an auditor reviews a record, a log will be added to the corresponding video_log. This log table only has insert. As you can imagine, one video corresponds to multiple logs. With 100,000 videos a day, an average of 5 logs for one video means 500,000 logs a day, 50 * 30 = 15 million records a month, and 1500 * 12 = 180 million records a year. There are currently more than 200 million pieces of data online. Since the log itself is not oriented to the C-end and is used to query problems, a slight delay can be tolerated. However, as time goes by, it will inevitably become slower and slower, affecting efficiency, so the idea of ​​transformation was proposed.

Solution 1: Back up old data

Since the log itself is not the most critical data, but it also requires high real-time performance (for real-time query problems), the initial idea was to keep the core basic storage unchanged and migrate the older data out. After all, the probability of suddenly querying operation records from a year ago is very small. If you need to check it suddenly, you can go offline. In terms of design, we only need a scheduled script to extract data at around 4 am every day (the off-peak period of business). The extracted data can be reported to some offline storage (generally companies have hive-based data warehouses, etc.), so that the online video_log data will not continue to grow.

Option 2: Split table

Splitting tables is also a solution. Compared with Solution 1, the advantage is that all data can be queried in real time. The disadvantage is that the code needs to be modified.

  • First, confirm the sharding key. Since video_log is bound to video, we naturally choose video_id as our sharding key.
  • Now that we have decided on how to divide the tables, we need to confirm how many tables to divide them into. Set a small goal first and stick to it for 3 years. The maximum amount of data in each table is 100 million (because our query is simple). According to the above statistics, in three years we will have approximately: 3*1.8=540 million, so we will need approximately 5.4/1≈6 tables.

The next step is to transform the code and solve the problem of reading and writing new and old data.

  • New data is inserted directly into the new table
  • Since the log table only has insert operations, there are no update and delete operations, so there is no need to consider these scenarios.
  • After the table is split, the log of a video exists in two tables (the old table and the new table), so both tables are checked temporarily and then merged.
  • Synchronize old data to the new table
  • Offline code to read old table

Solution 3: Migrate to tidb

The disadvantages of Option 2 are quite obvious. What to do after 3 years? Continue to dismantle the meter? I feel like there is always a historical debt there. So we set our sights on tidb, which is a distributed database. Once we connect to tidb, we don’t need to worry about sharding tables. Tidb will do all of this for us, and it will expand the node capacity on its own. Because it is distributed, the primary key of tidb is unordered, which is very important.
The whole process can be roughly divided into the following 4 steps:

  1. Double write first (record the mysql id when double writing starts, the data before this id must be old data)
  2. Synchronize old data (distinguished by the ID recorded in the first step)
  3. Cut read (old data synchronization completed)
  4. Double write

Let’s focus on the pitfalls of synchronizing old data.

Migrating to tidb seems simple, but there are actually several pitfalls hidden in the job script.

  • You need to consider what to do if the job is interrupted midway and you have to restart it. Apart from the time cost of re-running the data, the data that has been synchronized will be duplicated when it is re-run, and you also have to consider the problem of duplicate data. To solve the problem of duplicate data, you can add a new field to the old table to indicate whether it has been synchronized. Each time synchronization is completed, update the field. Disadvantages: The online data is large, adding a field is not safe and may cause online congestion.
  • Since adding a field is not a good idea, use the existing primary key id as a constraint and synchronize the primary key id. In this way, even if the script is restarted and run from the beginning, it will report an error and skip because the same primary key has been inserted. It seems perfect, but tidb is distributed and the primary key id is not continuous, so this situation may occur. When normal business data is inserted into tidb, the primary key ID assigned by tidb and the primary key ID synchronized with mysql are repeated. In this case, no matter who inserts it, the last one will definitely fail.

Final synchronization script solution

Taking into account the duplication of data, the efficiency of job restart, and the efficiency of the entire synchronization, I roughly made the following plan:

  1. Improve efficiency by dividing tasks into batches: First, divide the old data into batches according to the processing capacity and expected completion time. There are about 10 batches. 10 jobs run different batches of data without interfering with each other, and 100 items are updated in batches each time.
  2. Record the status and automatically restore to the breakpoint after restart: After each data synchronization, the current synchronization position is recorded (redis records the current id). Even if it is restarted, the previous update position can be obtained from redis and then updated.
  3. Avoid primary key conflicts: synchronize all fields except the primary key (do not synchronize the primary key)

Finally, the data migration was completed smoothly through the four switching steps of solution 3 + efficient synchronization script.

Summarize

This is the end of this article about MySQL migration solutions and pitfalls. For more relevant MySQL migration solutions and pitfalls, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Detailed steps for migrating the data folder of the MySQL database
  • Detailed explanation of how to migrate a MySQL database to another machine
  • Summary of methods for migrating Oracle database to MySQL
  • Migrate mysql database to Oracle database
  • MySQL database migration quickly exports and imports large amounts of data
  • How to use mysql Innodb tablespace unloading, migration, and loading
  • Detailed explanation of MySQL data migration--data directory direct replacement precautions
  • 5 ways to migrate from MySQL to ClickHouse
  • Detailed explanation of mysql5.5 database data directory migration method
  • MySQL backup and migration data synchronization method

<<:  25 CSS frameworks, tools, software and templates shared

>>:  Detailed explanation of the Svelte implementation principle for simple and easy JavaScript development

Recommend

How to import SQL files in Navicat Premium

I started working on my final project today, but ...

Detailed code for adding electron to the vue project

1. Add in package.json "main": "el...

Detailed example of deploying Nginx+Apache dynamic and static separation

Introduction to Nginx dynamic and static separati...

A link refresh page and js refresh page usage examples

1. How to use the link: Copy code The code is as f...

W3C Tutorial (11): W3C DOM Activities

The Document Object Model (DOM) is a platform, a ...

mysql5.7.19 zip detailed installation process and configuration

MySQL v5.7.19 official version (32/64 bit install...

WeChat applet implements user login module server construction

I chose node.js to build the server. Friends who ...

Vue uses mockjs to generate simulated data case details

Table of contents Install mockjs in your project ...

Detailed explanation of whereis example to find a specific program in Linux

Linux finds a specific program where is The where...

Vue3.0 project construction and usage process

Table of contents 1. Project construction 2: Dire...

Detailed explanation of deploying MySQL using Docker (data persistence)

This article briefly describes how to use Docker ...

JS implements the dragging and placeholder functions of elements

This blog post is about a difficulty encountered ...

JavaScript to implement login form

This article example shares the specific code of ...

Mysql anonymous login cannot create a database problem solution

Frequently asked questions Access denied for user...