MySQL database Shell import_table data import

MySQL database Shell import_table data import

MySQL Shell import_table data import

1. Introduction to import_table

In this issue, we introduce an efficient data import tool, import_table in the MySQL Shell toolset. The full name of the tool is Parallel Table Import Utility. As the name suggests, it supports concurrent data import. The tool has more complete functions after MySQL Shell version 8.0.23. The following lists the core functions of the tool

  • Basically covers all the functions of MySQL Data Load and can be used as a substitute
  • Concurrent import is supported by default (supports custom chunk size)
  • Support wildcard matching to import multiple files into one table at the same time (very suitable for aggregating data with the same structure into one table)
  • Support speed limit (very suitable for scenarios with high bandwidth requirements)
  • Supports compressed file processing
  • Supports importing into MySQL 5.7 and above

2. Load Data and import table function examples

This section provides command examples for the same functions as import table and load data. We still use the sample data of the employees table as an example to demonstrate the comprehensive scenario of MySQL load data.

  • Import data in custom order
  • Data function processing
  • Custom data value

The sample data is as follows:

[root@10-186-61-162 tmp]# cat employees_01.csv
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10004","1954-05-01","Christian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

Example table structure:

 10.186.61.162:3306 employees SQL > desc emp;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra | 
+-------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| full_name | varchar(64) | YES | | NULL | | -- Newly added field in the table, does not exist in the exported data file | gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| modify_date | datetime | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file| delete_flag | varchar(1) | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file+-------------+---------------+------+-----+---------+-------+

2.1 Importing data using Load Data

load data infile '/data/mysql/3306/tmp/employees_01.csv'
into table employees.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)
set emp_no=@C1,
    birth_date=@C2,
    first_name=upper(@C3),
    last_name=lower(@C4),
    full_name=concat(first_name,' ',last_name),
    gender=@C5,
    hire_date=@C6 ,
    modify_date = now(),
    delete_flag=if(hire_date<'1988-01-01','Y','N');

2.2 Importing data using import_table

util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3. import_table specific features

3.1 Multiple file import (fuzzy matching)

## Before importing, I generated 3 separate employees files, and the exported structures are consistent [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Import command, where employees_* is used for fuzzy matching util.import_table(
    [
        "/data/mysql/3306/tmp/employees_*",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })
    
## Import command, in which the path of the file to be imported is clearly specified util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
        "/data/mysql/3306/tmp/employees_02.csv",
        "/data/mysql/3306/tmp/employees_03.csv"
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })

3.2 Concurrent Import

Before experimenting with concurrent import, we create a 10 million sbtest1 table (about 2G data) to simulate concurrency. The import_table parameter uses threads as the concurrency configuration, and the default is 8 concurrency.

## Export the sbtest1 data required for the test [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Enable threads to be 8 concurrent util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "8"
    })

3.3 Import rate control

You can use maxRate and threads to control the import data of each concurrent thread. For example, if the current configuration has 4 threads and the rate of each thread is 2M/s, the maximum will not exceed 8M/s.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "maxRate": "2M"
    }) 

3.4 Custom chunk size

The default chunk size is 50M. We can adjust the chunk size to reduce the transaction size. For example, if we adjust the chunk size to 1M, the amount of data imported by each thread each time will also be reduced accordingly.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "bytesPerChunk": "1M",
        "maxRate": "2M"
    }) 

4. Load Data vs import_table performance comparison

  • Use the same library table
  • No special processing is done on the data, just import it as is
  • Do not modify the default parameters, only specify the required parameters
-- Load Data statement load data infile '/data/mysql/3306/tmp/sbtest1.csv'
into table demo.sbtest1
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'

-- import_table statement util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4"
    })

As you can see, Load Data takes about 5 minutes, while import_table takes less than half the time to complete data import, which is more than twice as efficient (under the condition of limited disk IO capacity in the virtual machine environment)

The above is the details of MySQL Shell import_table data import details. For more information about import_table data import, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Shell script to connect, read, write and operate MySQL database instance
  • Shell script to backup MySQL database data regularly and retain it for a specified time
  • Create MySQL database accounts on the server in batches through Shell scripts
  • Teach you how to connect to the database using MySQL Shell

<<:  Several ways to implement CSS height changing with width ratio

>>:  Detailed usage of docker-maven-plugin

Recommend

How to optimize MySQL performance through MySQL slow query

As the number of visits increases, the pressure o...

Use of Linux ifconfig command

1. Command Introduction The ifconfig (configure a...

How to quickly install RabbitMQ in Docker

1. Get the image #Specify the version that includ...

Handwritten Vue2.0 data hijacking example

Table of contents 1: Build webpack 2. Data hijack...

Example code for implementing a text marquee with CSS3

Background Here's what happened, Luzhu accide...

Linux sudo vulnerability could lead to unauthorized privileged access

Exploiting a newly discovered sudo vulnerability ...

A brief explanation of the reasonable application of table and div in page design

At the beginning of this article, I would like to ...

Windows platform configuration 5.7 version + MySQL database service

Includes the process of initializing the root use...

How to install JDK and Mysql on Ubuntu 18.04 Linux system

Platform deployment 1. Install JDK step1. Downloa...

How to run postgreSQL with docker

1. Install Docker. Reference URL: Docker Getting ...

js canvas realizes circular water animation

This article example shares the specific code of ...

JavaScript BOM Explained

Table of contents 1. BOM Introduction 1. JavaScri...

WeChat applet implements jigsaw puzzle game

This article shares the specific code for impleme...

Markup Language - Print Style Sheets

Click here to return to the 123WORDPRESS.COM HTML ...