MySQL Quick Data Comparison Techniques

MySQL Quick Data Comparison Techniques

In MySQL operation and maintenance, a R&D colleague wants to compare the data on two different instances and find the differences. In addition to the primary key, every field needs to be compared. How to do it?

The first solution is to write a program to extract each row of data from the two instances for comparison. This is theoretically feasible, but the comparison time is long.

The second solution is to merge all the fields in each row of data, take the checksum value, and then compare according to the checksum value. This seems feasible, so give it a try.

First, we need to merge the values ​​of all fields and use the CONCAT function provided by MySQL. If the CONCAT function contains NULL values, the final result will be NULL. Therefore, we need to use the IFNULL function to replace the NULL values, such as:

CONCAT(IFNULL(C1,''),IFNULL(C2,''))

The table you are joining has many rows, and it is tiring to manually compose a script. Don't worry, you can use information_schema.COLUMNS to handle it:

## Get the concatenated string of column names SELECT
GROUP_CONCAT('IFNULL(',COLUMN_NAME,','''')')
FROM information_schema.COLUMNS 
WHERE TABLE_NAME='table_name';

Assume we have test table:

CREATE TABLE t_test01
(
 id INT AUTO_INCREMENT PRIMARY KEY,
 C1 INT,
 C2 INT
)

We can then splice out the following SQL:

SELECT
id,
MD5(CONCAT(
IFNULL(id,''),
IFNULL(c1,''),
IFNULL(c2,''),
)) AS md5_value
FROM t_test01

Execute it on two instances, and then compare the results using Beyond Compare. It is easy to find the different rows and primary key IDs.

For tables with large amounts of data, the result set is also large and difficult to compare. So first try to reduce the result set. You can combine the md5 values ​​of multiple rows to calculate the MD5 value. If the final MD5 values ​​are the same, then these rows are the same. If they are different, it proves that there are differences. Then compare these rows line by line.

Assume that we compare in groups of 1,000 rows. If we need to merge the grouped results, we need to use the GROUP_CONCAT function. Note that we need to add sorting in the GROUP_CONCAT function to ensure the order of the merged data. The SQL is as follows:

SELECT
min(id) as min_id,
max(id) as max_id,
count(1) as row_count,
MD5(GROUP_CONCAT(
MD5(CONCAT(
IFNULL(id,''),
IFNULL(c1,''),
IFNULL(c2,''),
)) ORDER BY id
))AS md5_value
FROM t_test01
GROUP BY (id div 1000)

The execution results are:

min_id max_id row_count md5_value
0 999 1000 7d49def23611f610849ef559677fec0c
1000 1999 1000 95d61931aa5d3b48f1e38b3550daee08
2000 2999 1000 b02612548fae8a4455418365b3ae611a
3000 3999 1000 fe798602ab9dd1c69b36a0da568b6dbb

When there are fewer different data, even if we need to compare tens of millions of data, we can easily locate the 1,000 data with differences based on min_id and max_id, and then compare the MD5 values ​​line by line to finally find the different rows.

Final comparison chart:

PS:

When using GROUP_CONCAT, you need to configure the MySQL variable group_concat_max_len. The default value is 1024, and the excess will be staged.

You may also be interested in:
  • MySQL 5.7.20 common download, installation and configuration methods and simple operation skills (decompression version free installation)
  • How to use Java Web to connect to MySQL database
  • How to use tcpdump to capture packets for mysql
  • Detailed explanation of 30 SQL query optimization techniques for MySQL tens of millions of large data
  • Mysql optimization techniques for querying dates based on time
  • 10 SQL statement optimization techniques to improve MYSQL query efficiency
  • Summary of common problems and application skills in MySQL
  • 5 Tips for Protecting Your MySQL Data Warehouse
  • Share 101 MySQL debugging and optimization tips
  • MySql Sql optimization tips sharing
  • Summary of MySQL injection bypass filtering techniques
  • Summary of common operation skills of MySQL database

<<:  How to implement nginx smooth restart

>>:  JS Decorator Pattern and TypeScript Decorators

Recommend

What to do after installing Ubuntu 20.04 (beginner's guide)

Ubuntu 20.04 has been released, bringing many new...

Vue implements dynamic routing details

Table of contents 1. Front-end control 1. In the ...

How to create a view on multiple tables in MySQL

In MySQL, create a view on two or more base table...

How to implement import and export mysql database commands under linux

1. Export the database using the mysqldump comman...

Solution for Baidu site search not supporting https (tested)

Recently, https has been enabled on the mobile ph...

How to configure SSL for koa2 service

I. Introduction 1: SSL Certificate My domain name...

Installation process of MySQL5.7.22 on Mac

1. Use the installation package to install MySQL ...

Tutorial on downloading, installing, configuring and using MySQL under Windows

Overview of MySQL MySQL is a relational database ...

Docker image compression and optimization operations

The reason why Docker is so popular nowadays is m...

js to achieve the complete steps of Chinese to Pinyin conversion

I used js to create a package for converting Chin...

A screenshot demo based on canvas in html

Written at the beginning I remember seeing a shar...

How to limit the number of concurrent connection requests in nginx

Introduction The module that limits the number of...

How to access the local machine (host machine) in Docker

Question How to access the local database in Dock...

Introduction to MySQL overall architecture

The overall architecture of MySQL is divided into...