Implementation of MySQL Shell import_table data import

Implementation of MySQL Shell import_table data import

1. Introduction to import_table

In the last technical sharing, we introduced 4 common methods of MySQL Load Data to import text data into MySQL. In this issue, we will continue to introduce another more efficient data import tool, import_table in the MySQL Shell tool set. The full name of this tool is Parallel Table Import Utility . As the name suggests, it supports concurrent data import. After MySQL Shell version 8.0.23, the function of this tool is more complete. The following lists the core functions of this tool

  • Basically covers all the functions of MySQL Data Load and can be used as a substitute
  • Concurrent import is supported by default (supports custom chunk size)
  • Support wildcard matching to import multiple files into one table at the same time (very suitable for aggregating data with the same structure into one table)
  • Support speed limit (very suitable for scenarios with high bandwidth requirements)
  • Supports compressed file processing
  • Supports importing into MySQL 5.7 and above

2. Load Data and import table function examples

This section provides command examples for the same functions as import table and load data. We still use the sample data of the employees table as an example to demonstrate the comprehensive scenario of MySQL load data.

  • Import data in custom order
  • Data function processing
  • Custom data value
## Sample data is as follows [root@10-186-61-162 tmp]# cat employees_01.csv
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10004","1954-05-01","Christian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

## Sample table structure 10.186.61.162:3306 employees SQL > desc emp;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| full_name | varchar(64) | YES | | NULL | | -- Newly added field in the table, does not exist in the exported data file | gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| modify_date | datetime | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file| delete_flag | varchar(1) | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file+-------------+---------------+------+-----+---------+-------+

2.1 Importing data using Load Data

The specific meaning of the parameters is not explained here. If you need to understand the syntax rules and meanings, please refer to the previous article in the series <Multiple uses of MySQL Load Data>

load data infile '/data/mysql/3306/tmp/employees_01.csv'
into table employees.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)
set emp_no=@C1,
    birth_date=@C2,
    first_name=upper(@C3),
    last_name=lower(@C4),
    full_name=concat(first_name,' ',last_name),
    gender=@C5,
    hire_date=@C6 ,
    modify_date = now(),
    delete_flag=if(hire_date<'1988-01-01','Y','N'); 

2.2 Importing data using import_table

util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3. import_table specific features

3.1 Multiple file import (fuzzy matching)

## Before importing, I generated 3 separate employees files, and the exported structures are consistent [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Import command, where employees_* is used for fuzzy matching util.import_table(
    [
        "/data/mysql/3306/tmp/employees_*",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })
    
## Import command, in which the path of the file to be imported is clearly specified util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
        "/data/mysql/3306/tmp/employees_02.csv",
        "/data/mysql/3306/tmp/employees_03.csv"
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3.2 Concurrent Import

Before experimenting with concurrent import, we create a 10 million sbtest1 table (about 2G data) to simulate concurrency. The import_table parameter uses threads as the concurrency configuration, and the default is 8 concurrency.

## Export the sbtest1 data required for the test [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Enable threads to be 8 concurrent util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "8"
    }) 

3.3 Import rate control

You can use maxRate and threads to control the import data of each concurrent thread. For example, if the current configuration has 4 threads and the rate of each thread is 2M/s, the maximum will not exceed 8M/s.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "maxRate": "2M"
    }) 

3.4 Custom chunk size

The default chunk size is 50M. We can adjust the chunk size to reduce the transaction size. For example, if we adjust the chunk size to 1M, the amount of data imported by each thread each time will also be reduced accordingly.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "bytesPerChunk": "1M",
        "maxRate": "2M"
    }) 

4. Load Data vs import_table performance comparison

  • Use the same library table
  • No special processing is done on the data, just import it as is
  • Do not modify the default parameters, only specify the required parameters
-- Load Data statement load data infile '/data/mysql/3306/tmp/sbtest1.csv'
into table demo.sbtest1
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'

-- import_table statement util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4"
    }) 

As you can see, Load Data takes about 5 minutes, while import_table takes less than half the time to complete data import, which is more than twice as efficient (under the condition of limited disk IO capacity in the virtual machine environment)

5. Technical Summary

  • import_table includes almost all the functions of Load Data
  • import_table import is more efficient than Load Data
  • import_table supports fine-grained control over import speed, concurrency, and the size of each imported data
  • The import progress report of import_table is more detailed, which is convenient for troubleshooting and time estimation, including
    • Import speed
    • Total import time
    • The amount of data imported in each batch, whether there are Warnings, etc.
    • Import the final summary report

This is the end of this article about the implementation of MySQL import_table data import. For more relevant MySQL import_table data import content, please search 123WORDPRESS.COM's previous articles or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Mysql command line import sql data
  • MYSQL database import and export commands
  • Summary of solutions to Chinese garbled characters in MySQL import and export data
  • How to import and export data from MySQL text files
  • How to import csv format data file solution into MySQL
  • How to import sql files in linux (using command line to transfer mysql database)
  • How to import and export databases and data tables in mysql
  • Introduction to mysql import and export database, functions and stored procedures
  • MYSQL big data import

<<:  Solution to the problem of incomplete display of select drop-down box content in HTML and partial coverage

>>:  A detailed introduction to the redesign of the Weibo component on the website (pictures and text)

Recommend

WeChat applet uniapp realizes the left swipe to delete effect (complete code)

WeChat applet uniapp realizes the left swipe to d...

Detailed explanation of HTML form elements (Part 2)

HTML Input Attributes The value attribute The val...

JavaScript anti-shake case study

principle The principle of anti-shake is: you can...

HTML table tag tutorial (21): row border color attribute BORDERCOLOR

To beautify the table, you can set different bord...

Detailed explanation of CSS margin collapsing

Previous This is a classic old question. Since a ...

HTML Basics: The basic structure of HTML

The basic structure of HTML hypertext documents is...

Standard summary for analyzing the performance of a SQL statement

This article will introduce how to use explain to...

JavaScript to implement the function of changing avatar

This article shares the specific code of JavaScri...

How to dynamically add a volume to a running Docker container

Someone asked me before whether it is possible to...

How to set a fixed IP address in CentOS7 virtual machine

Since my development environment is to install Ce...

Chrome monitors cookie changes and assigns values

The following code introduces Chrome's monito...

Example code for implementing random roll caller in html

After this roll call device starts calling the ro...