Analysis of the principle of MySQL large table extraction mode from the perspective of cloud data migration service

Summary: What method should be used for MySQL JDBC extraction? Let me tell you about it.

I was recently tortured by MySQL extraction mode in a migration project on the cloud. At first, we were criticized by customers for memory overflow, and then we were criticized again for low migration efficiency. What method should be used for MySQL JDBC extraction? Let me tell you about it.

1.1 Java-JDBC Communication Principles

The communication between JDBC and the database is completed through socket. The general process is shown in the figure below. Mysql Server -> Kernel Socket Buffer -> Client Socket Buffer -> JVM where JDBC is located

1.2 Three modes of JDBC reading data

1.2.1 Method 1: Read data using JDBC default parameters

It is mainly divided into the following steps:

1) Mysql Server writes data to the local kernel buffer of the socket server through OuputStream. This is a memory copy.

2) When there is data in the local Kennel Buffer of the Socket Server, the data will be transferred to the Kennel Buffer of the machine where the Socket Client is located through the TCP link.

3) The JVM where JDBC is located uses InputSream to read local Kennel Buffer data into JVM memory. If there is no data, the reading is blocked.

The next step is to repeat the process 1, 2, 3 continuously. The problem is that the JVM of the Socket Client reads the Kennel Buffer in the default mode without considering the size of the local memory, and reads as much as possible. If the data is too large, it will cause FULL GC, followed by memory overflow.

Refer to the JDBC API docs, the default mode Java demo code is as follows

1.2.2 Method 2: Cursor Query

To solve the memory overflow problem in method 1, JDBC provides a cursor parameter. When establishing a JDBC connection, add useCursorFetch=true. After setting the cursor, JDBC will tell the server the amount of data to be extracted each time to avoid memory overflow. The communication process is shown in the figure below.

Although method 2 cursor query solves the problem of memory overflow, method 2 is highly dependent on network quality. When the network delay increases, assuming that each communication increases by 10ms, 100,000 communications will take 1,000 seconds longer. Here is just the RT of each request. Every time TCP sends a message, it requires feedback ACK to ensure data reliability. Every time the client fetches 100 rows (the number of requested rows is configurable), there will be multiple communications, further amplifying the efficiency problem caused by increased latency. In addition, under cursor query, MySQL cannot predict the end delay of the query. In order to cope with its own DML operations, it will create a temporary space locally to store the data to be extracted. Therefore, the following phenomena will occur during cursor query:

a. IOPS soars. Mysql writes data to the temporary space and reads data from the temporary space during data transmission, which will trigger a large number of IO operations.

b. Disk space soars. The life cycle of temporary space exists in the entire JDBC reading phase and will not be reclaimed by MySQL until the client initiates Result.close().

c. CPU and memory increase by a certain percentage.

For the principles of cursor query, please refer to the blog MySQL JDBC StreamResult Communication Principle Analysis and JDBC source code, which will not be repeated in this article.

Refer to the JDBC API docs, the cursor mode Java demo code is as follows

1.2.3 Method 3: Stream Reading Data

Method 1 will cause JVM memory overflow. Although method 2 will not cause FULL GC, the communication efficiency is low and it will also cause the IOPS of the Mysql server to soar and consume disk space. Therefore, we introduce Stream to read data. The stream needs to be set before reading the Result

Method 3 does not perform any server-client interaction before communication, thus avoiding low communication efficiency. The server prepares data and writes it into the Server's Kennel Buffer. These data are transmitted to the Client's Kennel Buffer through the TCP link. Then the client's inputStream.read() method is awakened to read the data. Different from method 1, the client will only read data of the size of a package at a time. If a package is not full of one line, another package will be read. When the client consumes data slower than the data transmission rate, the data in the kennel area on the client side will be full, and then the kennel data on the server side will also be full, thus blocking OuputStream. In this way, JDBC in Stream mode is like a water pipe connecting two reservoirs, and the Client and Server reach a balance.

For the JDBC client, since data is read from the kennel each time, the efficiency is much higher than method 2, and reading a small amount of data each time will not cause JVM memory overflow. For the server, Mysql writes data to the kennel every time, there is no need to create temporary space, no IO reading is involved, and the pressure on the server is also reduced. Of course, method 3 also has its own problems, such as the inability to cancel when streaming, and the non-blocking nature of cancel.

Refer to the JDBC API docs. Many online tutorials require setting useCursorFetch=trueResultSet.FETCH_REVERSE, etc. In fact, after studying the JDBC driver source code, the editor found that it is only necessary to set fetchSize=Integer.MIN_VALUE, and other configurations are consistent with the default configuration. The cursor mode Java demo code is as follows

1.3 Optimizing the Cloud Data Migration Service in Three Modes

Cloud Data Migration (CDM) is a migration tool on Huawei Cloud. For details, please refer to the CDM official website. The editor will use CDM to introduce how to switch between three modes to extract data. CDM uses mode 3 by default, which is streaming data extraction. If you need to switch to mode 1, additional configuration is required for mode 2.

1.3.1 Configuration method 1: Default reading

Create a new Mysql connector. For details on how to create it, see the official website. Add useCursorFetch=false and adopt.stream=false in the advanced properties.

1.3.2 Configuration method 2: cursor query

Edit the MySQL connector and add useCursorFetch=true and adopt.stream=false in the advanced properties. The size of the cursor query can be adjusted through the Fetch Size on the interface, the default is 1000.

1.3.3 Configuration method 3: Streaming

CDM uses streaming mode by default and no additional configuration is required. Note that in Stream mode, Fetch Size on the interface is invalid. For the reason, refer to the previous section.

1.3.4 Performance comparison

Create a CDM migration job for Mysql2Hive. The source table has 101 fields and 1 million rows of data. The configuration is as follows:

Method 1: Writing 1 million rows of data takes 1m22s

Method 2: Also write 1 million rows, adjust fetchSzie to 1, 10, 100, 100 respectively, and the minimum time consumption is 2m1s

Method 3: Also write 1 million rows, taking 1m5s

The editor also tested a small table with 1 million items. It is obvious that the speed of method 1 and method 3 is much higher than that of method 2. In addition, the editor also tested a large table with 10 million items. Method 1 exceeded the memory limit, method 2 migrated normally but took more than 20 minutes, and method 3 could still be completed within 15 minutes.

This concludes this article on the principle analysis of MySQL large table extraction mode from the perspective of cloud data migration service. For more relevant MySQL large table extraction content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

Implementation of mysql8.0.11 data directory migration
How to migrate the data directory in mysql8.0.20
How to migrate local mysql to server database
Detailed explanation of MySQL event modification events (ALTER EVENT), disabling events (DISABLE), enabling events (ENABLE), event renaming and database event migration operations
Upgrade Docker version of MySQL 5.7 to MySQL 8.0.13, data migration
Detailed explanation of how to migrate a MySQL database to another machine
MySQL database migration quickly exports and imports large amounts of data
Python makes mysql data migration script
Summary of MySQL data migration

<<: JavaScript prototype and prototype chain details

>>: W3C Tutorial (2): W3C Programs

Vue: Detailed explanation of memory leaks

Analysis of the principle of MySQL large table extraction mode from the perspective of cloud data migration service

1.1 Java-JDBC Communication Principles

1.2 Three modes of JDBC reading data

1.2.1 Method 1: Read data using JDBC default parameters

1.2.2 Method 2: Cursor Query

1.2.3 Method 3: Stream Reading Data

1.3 Optimizing the Cloud Data Migration Service in Three Modes

1.3.1 Configuration method 1: Default reading

1.3.2 Configuration method 2: cursor query

1.3.3 Configuration method 3: Streaming

1.3.4 Performance comparison

Vue: Detailed explanation of memory leaks

Vue Element front-end application development to obtain back-end data

VMware installation of Ubuntu 20.04 operating system tutorial diagram

Steps for restoring a single MySQL table

Implementation of vue+drf+third-party sliding verification code access

Detailed explanation of JavaScript Promise and Async/Await

How to set up cross-domain access in IIS web.config

This article will show you how to use SQL CASE WHEN in detail

Detailed explanation of JavaScript prototype and examples

Sharing some details about MySQL indexes

Recommend

Detailed tutorial on installing qt5.12.8 and environment configuration on ubuntu18.04

How to solve the problem that the project in eclipse cannot be added to tomcat

JS asynchronous execution principle and callback details

Stealing data using CSS in Firefox

Examples of two ways to implement a horizontal scroll bar

Use CSS to set the width of INPUT in TD

Summary of commonly used escape characters in HTML

Install Windows Server 2019 on VMware Workstation (Graphic Tutorial)

About the problem of writing plugins for mounting DOM in vue3

Solve the problem that the directory mounted by docker cannot be read and written

Usage and execution process of http module in node

SQL Aggregation, Grouping, and Sorting

Installation tutorial of MySQL 5.7 green version under windows2008 64-bit system

mysql 8.0.15 winx64 decompression version graphic installation tutorial

Implementation steps for Docker deployment of SpringBoot applications