MySQL database Shell import_table data import

MySQL Shell import_table data import

1. Introduction to import_table
2. Load Data and import table function examples

2.1 Importing data using Load Data
2.2 Importing data using import_table

3. import_table specific features

3.1 Multiple file import (fuzzy matching)
3.2 Concurrent Import
3.3 Import rate control
3.4 Custom chunk size

4. Load Data vs import_table performance comparison

MySQL Shell import_table data import

1. Introduction to import_table

In this issue, we introduce an efficient data import tool, import_table in the MySQL Shell toolset. The full name of the tool is Parallel Table Import Utility. As the name suggests, it supports concurrent data import. The tool has more complete functions after MySQL Shell version 8.0.23. The following lists the core functions of the tool

Basically covers all the functions of MySQL Data Load and can be used as a substitute
Concurrent import is supported by default (supports custom chunk size)
Support wildcard matching to import multiple files into one table at the same time (very suitable for aggregating data with the same structure into one table)
Support speed limit (very suitable for scenarios with high bandwidth requirements)
Supports compressed file processing
Supports importing into MySQL 5.7 and above

2. Load Data and import table function examples

This section provides command examples for the same functions as import table and load data. We still use the sample data of the employees table as an example to demonstrate the comprehensive scenario of MySQL load data.

Import data in custom order
Data function processing
Custom data value

The sample data is as follows:

[root@10-186-61-162 tmp]# cat employees_01.csv
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10004","1954-05-01","Christian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

Example table structure:

 10.186.61.162:3306 employees SQL > desc emp;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra | 
+-------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| full_name | varchar(64) | YES | | NULL | | -- Newly added field in the table, does not exist in the exported data file | gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| modify_date | datetime | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file| delete_flag | varchar(1) | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file+-------------+---------------+------+-----+---------+-------+

2.1 Importing data using Load Data

load data infile '/data/mysql/3306/tmp/employees_01.csv'
into table employees.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)
set emp_no=@C1,
    birth_date=@C2,
    first_name=upper(@C3),
    last_name=lower(@C4),
    full_name=concat(first_name,' ',last_name),
    gender=@C5,
    hire_date=@C6 ,
    modify_date = now(),
    delete_flag=if(hire_date<'1988-01-01','Y','N');

2.2 Importing data using import_table

util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })

3. import_table specific features

3.1 Multiple file import (fuzzy matching)

## Before importing, I generated 3 separate employees files, and the exported structures are consistent [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Import command, where employees_* is used for fuzzy matching util.import_table(
    [
        "/data/mysql/3306/tmp/employees_*",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })
    
## Import command, in which the path of the file to be imported is clearly specified util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
        "/data/mysql/3306/tmp/employees_02.csv",
        "/data/mysql/3306/tmp/employees_03.csv"
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })

3.2 Concurrent Import

Before experimenting with concurrent import, we create a 10 million sbtest1 table (about 2G data) to simulate concurrency. The import_table parameter uses threads as the concurrency configuration, and the default is 8 concurrency.

## Export the sbtest1 data required for the test [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Enable threads to be 8 concurrent util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "8"
    })

3.3 Import rate control

You can use maxRate and threads to control the import data of each concurrent thread. For example, if the current configuration has 4 threads and the rate of each thread is 2M/s, the maximum will not exceed 8M/s.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "maxRate": "2M"
    })

3.4 Custom chunk size

The default chunk size is 50M. We can adjust the chunk size to reduce the transaction size. For example, if we adjust the chunk size to 1M, the amount of data imported by each thread each time will also be reduced accordingly.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "bytesPerChunk": "1M",
        "maxRate": "2M"
    })

4. Load Data vs import_table performance comparison

Use the same library table
No special processing is done on the data, just import it as is
Do not modify the default parameters, only specify the required parameters

-- Load Data statement load data infile '/data/mysql/3306/tmp/sbtest1.csv'
into table demo.sbtest1
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'

-- import_table statement util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4"
    })

As you can see, Load Data takes about 5 minutes, while import_table takes less than half the time to complete data import, which is more than twice as efficient (under the condition of limited disk IO capacity in the virtual machine environment)

The above is the details of MySQL Shell import_table data import details. For more information about import_table data import, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:

Shell script to connect, read, write and operate MySQL database instance
Shell script to backup MySQL database data regularly and retain it for a specified time
Create MySQL database accounts on the server in batches through Shell scripts
Teach you how to connect to the database using MySQL Shell

<<: Several ways to implement CSS height changing with width ratio

>>: Detailed usage of docker-maven-plugin