Analysis of the principle of MySQL large table extraction mode from the perspective of cloud data migration service

Analysis of the principle of MySQL large table extraction mode from the perspective of cloud data migration service

Summary: What method should be used for MySQL JDBC extraction? Let me tell you about it.

I was recently tortured by MySQL extraction mode in a migration project on the cloud. At first, we were criticized by customers for memory overflow, and then we were criticized again for low migration efficiency. What method should be used for MySQL JDBC extraction? Let me tell you about it.

1.1 Java-JDBC Communication Principles

The communication between JDBC and the database is completed through socket. The general process is shown in the figure below. Mysql Server -> Kernel Socket Buffer -> Client Socket Buffer -> JVM where JDBC is located

1.2 Three modes of JDBC reading data

1.2.1 Method 1: Read data using JDBC default parameters

It is mainly divided into the following steps:

1) Mysql Server writes data to the local kernel buffer of the socket server through OuputStream. This is a memory copy.

2) When there is data in the local Kennel Buffer of the Socket Server, the data will be transferred to the Kennel Buffer of the machine where the Socket Client is located through the TCP link.

3) The JVM where JDBC is located uses InputSream to read local Kennel Buffer data into JVM memory. If there is no data, the reading is blocked.

The next step is to repeat the process 1, 2, 3 continuously. The problem is that the JVM of the Socket Client reads the Kennel Buffer in the default mode without considering the size of the local memory, and reads as much as possible. If the data is too large, it will cause FULL GC, followed by memory overflow.

Refer to the JDBC API docs, the default mode Java demo code is as follows

1.2.2 Method 2: Cursor Query

To solve the memory overflow problem in method 1, JDBC provides a cursor parameter. When establishing a JDBC connection, add useCursorFetch=true. After setting the cursor, JDBC will tell the server the amount of data to be extracted each time to avoid memory overflow. The communication process is shown in the figure below.

Although method 2 cursor query solves the problem of memory overflow, method 2 is highly dependent on network quality. When the network delay increases, assuming that each communication increases by 10ms, 100,000 communications will take 1,000 seconds longer. Here is just the RT of each request. Every time TCP sends a message, it requires feedback ACK to ensure data reliability. Every time the client fetches 100 rows (the number of requested rows is configurable), there will be multiple communications, further amplifying the efficiency problem caused by increased latency. In addition, under cursor query, MySQL cannot predict the end delay of the query. In order to cope with its own DML operations, it will create a temporary space locally to store the data to be extracted. Therefore, the following phenomena will occur during cursor query:

a. IOPS soars. Mysql writes data to the temporary space and reads data from the temporary space during data transmission, which will trigger a large number of IO operations.

b. Disk space soars. The life cycle of temporary space exists in the entire JDBC reading phase and will not be reclaimed by MySQL until the client initiates Result.close().

c. CPU and memory increase by a certain percentage.

For the principles of cursor query, please refer to the blog MySQL JDBC StreamResult Communication Principle Analysis and JDBC source code, which will not be repeated in this article.

Refer to the JDBC API docs, the cursor mode Java demo code is as follows

1.2.3 Method 3: Stream Reading Data

Method 1 will cause JVM memory overflow. Although method 2 will not cause FULL GC, the communication efficiency is low and it will also cause the IOPS of the Mysql server to soar and consume disk space. Therefore, we introduce Stream to read data. The stream needs to be set before reading the Result

Method 3 does not perform any server-client interaction before communication, thus avoiding low communication efficiency. The server prepares data and writes it into the Server's Kennel Buffer. These data are transmitted to the Client's Kennel Buffer through the TCP link. Then the client's inputStream.read() method is awakened to read the data. Different from method 1, the client will only read data of the size of a package at a time. If a package is not full of one line, another package will be read. When the client consumes data slower than the data transmission rate, the data in the kennel area on the client side will be full, and then the kennel data on the server side will also be full, thus blocking OuputStream. In this way, JDBC in Stream mode is like a water pipe connecting two reservoirs, and the Client and Server reach a balance.

For the JDBC client, since data is read from the kennel each time, the efficiency is much higher than method 2, and reading a small amount of data each time will not cause JVM memory overflow. For the server, Mysql writes data to the kennel every time, there is no need to create temporary space, no IO reading is involved, and the pressure on the server is also reduced. Of course, method 3 also has its own problems, such as the inability to cancel when streaming, and the non-blocking nature of cancel.

Refer to the JDBC API docs. Many online tutorials require setting useCursorFetch=trueResultSet.FETCH_REVERSE, etc. In fact, after studying the JDBC driver source code, the editor found that it is only necessary to set fetchSize=Integer.MIN_VALUE, and other configurations are consistent with the default configuration. The cursor mode Java demo code is as follows

1.3 Optimizing the Cloud Data Migration Service in Three Modes

Cloud Data Migration (CDM) is a migration tool on Huawei Cloud. For details, please refer to the CDM official website. The editor will use CDM to introduce how to switch between three modes to extract data. CDM uses mode 3 by default, which is streaming data extraction. If you need to switch to mode 1, additional configuration is required for mode 2.

1.3.1 Configuration method 1: Default reading

Create a new Mysql connector. For details on how to create it, see the official website. Add useCursorFetch=false and adopt.stream=false in the advanced properties.

1.3.2 Configuration method 2: cursor query

Edit the MySQL connector and add useCursorFetch=true and adopt.stream=false in the advanced properties. The size of the cursor query can be adjusted through the Fetch Size on the interface, the default is 1000.

1.3.3 Configuration method 3: Streaming

CDM uses streaming mode by default and no additional configuration is required. Note that in Stream mode, Fetch Size on the interface is invalid. For the reason, refer to the previous section.

1.3.4 Performance comparison

Create a CDM migration job for Mysql2Hive. The source table has 101 fields and 1 million rows of data. The configuration is as follows:

Method 1: Writing 1 million rows of data takes 1m22s

Method 2: Also write 1 million rows, adjust fetchSzie to 1, 10, 100, 100 respectively, and the minimum time consumption is 2m1s

Method 3: Also write 1 million rows, taking 1m5s

The editor also tested a small table with 1 million items. It is obvious that the speed of method 1 and method 3 is much higher than that of method 2. In addition, the editor also tested a large table with 10 million items. Method 1 exceeded the memory limit, method 2 migrated normally but took more than 20 minutes, and method 3 could still be completed within 15 minutes.

This concludes this article on the principle analysis of MySQL large table extraction mode from the perspective of cloud data migration service. For more relevant MySQL large table extraction content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Implementation of mysql8.0.11 data directory migration
  • How to migrate the data directory in mysql8.0.20
  • How to migrate local mysql to server database
  • Detailed explanation of MySQL event modification events (ALTER EVENT), disabling events (DISABLE), enabling events (ENABLE), event renaming and database event migration operations
  • Upgrade Docker version of MySQL 5.7 to MySQL 8.0.13, data migration
  • Detailed explanation of how to migrate a MySQL database to another machine
  • MySQL database migration quickly exports and imports large amounts of data
  • Python makes mysql data migration script
  • Summary of MySQL data migration

<<:  JavaScript prototype and prototype chain details

>>:  W3C Tutorial (2): W3C Programs

Recommend

Summary of some small issues about MySQL auto-increment ID

The following questions are all based on the Inno...

Floating menu, can achieve up and down scrolling effect

The code can be further streamlined, but due to t...

js implements the algorithm for specifying the order and amount of red envelopes

This article shares the specific code of js to im...

Blog Design Web Design Debut

The first web page I designed is as follows: I ha...

border-radius method to add rounded borders to elements

border-radius:10px; /* All corners are rounded wi...

Sharing of SVN service backup operation steps

SVN service backup steps 1. Prepare the source se...

Specific use of useRef in React

I believe that people who have experience with Re...

Solve the problem of setting Chinese language pack for Docker container

If you use docker search centos in Docker Use doc...

Docker cleaning killer/Docker overlay file takes up too much disk space

[Looking at all the migration files on the Intern...

Introduction to /etc/my.cnf parameters in MySQL 5.7

Below are some common parameters of /etc/my.cnf o...

Design Theory: A Method to Understand People's Hearts

<br />Once, Foyin and Mr. Dongpo were chatti...

The implementation of Youda's new petite-vue

Table of contents Preface Introduction Live Easy ...

Mysql transaction isolation level principle example analysis

introduction You must have encountered this in an...