MySQL database Shell import_table data import

MySQL database Shell import_table data import

MySQL Shell import_table data import

1. Introduction to import_table

In this issue, we introduce an efficient data import tool, import_table in the MySQL Shell toolset. The full name of the tool is Parallel Table Import Utility. As the name suggests, it supports concurrent data import. The tool has more complete functions after MySQL Shell version 8.0.23. The following lists the core functions of the tool

  • Basically covers all the functions of MySQL Data Load and can be used as a substitute
  • Concurrent import is supported by default (supports custom chunk size)
  • Support wildcard matching to import multiple files into one table at the same time (very suitable for aggregating data with the same structure into one table)
  • Support speed limit (very suitable for scenarios with high bandwidth requirements)
  • Supports compressed file processing
  • Supports importing into MySQL 5.7 and above

2. Load Data and import table function examples

This section provides command examples for the same functions as import table and load data. We still use the sample data of the employees table as an example to demonstrate the comprehensive scenario of MySQL load data.

  • Import data in custom order
  • Data function processing
  • Custom data value

The sample data is as follows:

[root@10-186-61-162 tmp]# cat employees_01.csv
"10001","1953-09-02","Georgi","Facello","M","1986-06-26"
"10003","1959-12-03","Parto","Bamford","M","1986-08-28"
"10002","1964-06-02","Bezalel","Simmel","F","1985-11-21"
"10004","1954-05-01","Christian","Koblick","M","1986-12-01"
"10005","1955-01-21","Kyoichi","Maliniak","M","1989-09-12"
"10006","1953-04-20","Anneke","Preusig","F","1989-06-02"
"10007","1957-05-23","Tzvetan","Zielinski","F","1989-02-10"
"10008","1958-02-19","Saniya","Kalloufi","M","1994-09-15"
"10009","1952-04-19","Sumant","Peac","F","1985-02-18"
"10010","1963-06-01","Duangkaew","Piveteau","F","1989-08-24"

Example table structure:

 10.186.61.162:3306 employees SQL > desc emp;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra | 
+-------------+---------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| full_name | varchar(64) | YES | | NULL | | -- Newly added field in the table, does not exist in the exported data file | gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
| modify_date | datetime | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file| delete_flag | varchar(1) | YES | | NULL | | -- A new field is added to the table, which does not exist in the exported data file+-------------+---------------+------+-----+---------+-------+

2.1 Importing data using Load Data

load data infile '/data/mysql/3306/tmp/employees_01.csv'
into table employees.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@C1,@C2,@C3,@C4,@C5,@C6)
set emp_no=@C1,
    birth_date=@C2,
    first_name=upper(@C3),
    last_name=lower(@C4),
    full_name=concat(first_name,' ',last_name),
    gender=@C5,
    hire_date=@C6 ,
    modify_date = now(),
    delete_flag=if(hire_date<'1988-01-01','Y','N');

2.2 Importing data using import_table

util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    }) 

3. import_table specific features

3.1 Multiple file import (fuzzy matching)

## Before importing, I generated 3 separate employees files, and the exported structures are consistent [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Import command, where employees_* is used for fuzzy matching util.import_table(
    [
        "/data/mysql/3306/tmp/employees_*",
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })
    
## Import command, in which the path of the file to be imported is clearly specified util.import_table(
    [
        "/data/mysql/3306/tmp/employees_01.csv",
        "/data/mysql/3306/tmp/employees_02.csv",
        "/data/mysql/3306/tmp/employees_03.csv"
    ],
    {
        "schema": "employees", 
        "table": "emp",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "columns": [1,2,3,4,5,6], ## Use as many serial numbers as there are columns in the file "decodeColumns": {
            "emp_no": "@1", ## corresponds to the first column in the file"birth_date": "@2", ## corresponds to the second column in the file"first_name": "upper(@3)", ## corresponds to the third column in the file, and converts it to uppercase"last_name": "lower(@4)", ## corresponds to the fourth column in the file, and converts it to uppercase"full_name": "concat(@3,' ',@4)", ## merges the third and fourth columns in the file into one column to generate the field value in the table"gender": "@5", ## corresponds to the fifth column in the file"hire_date": "@6", ## corresponds to the sixth column in the file"modify_date": "now()", ## uses a function to generate the field value in the table"delete_flag": "if(@6<'1988-01-01','Y','N')" ## Make a logical judgment based on the sixth column in the file and generate the corresponding field value in the table}
    })

3.2 Concurrent Import

Before experimenting with concurrent import, we create a 10 million sbtest1 table (about 2G data) to simulate concurrency. The import_table parameter uses threads as the concurrency configuration, and the default is 8 concurrency.

## Export the sbtest1 data required for the test [root@10-186-61-162 tmp]# ls -lh
Total usage: 1.9G
-rw-r----- 1 mysql mysql 579 March 24 19:07 employees_01.csv
-rw-r----- 1 mysql mysql 584 March 24 18:48 employees_02.csv
-rw-r----- 1 mysql mysql 576 March 24 18:48 employees_03.csv
-rw-r----- 1 mysql mysql 1.9G March 26 17:15 sbtest1.csv

## Enable threads to be 8 concurrent util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "8"
    })

3.3 Import rate control

You can use maxRate and threads to control the import data of each concurrent thread. For example, if the current configuration has 4 threads and the rate of each thread is 2M/s, the maximum will not exceed 8M/s.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "maxRate": "2M"
    }) 

3.4 Custom chunk size

The default chunk size is 50M. We can adjust the chunk size to reduce the transaction size. For example, if we adjust the chunk size to 1M, the amount of data imported by each thread each time will also be reduced accordingly.

util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4",
        "threads": "4",
        "bytesPerChunk": "1M",
        "maxRate": "2M"
    }) 

4. Load Data vs import_table performance comparison

  • Use the same library table
  • No special processing is done on the data, just import it as is
  • Do not modify the default parameters, only specify the required parameters
-- Load Data statement load data infile '/data/mysql/3306/tmp/sbtest1.csv'
into table demo.sbtest1
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'

-- import_table statement util.import_table(
    [
        "/data/mysql/3306/tmp/sbtest1.csv",
    ],
    {
        "schema": "demo", 
        "table": "sbtest1",
        "dialect": "csv-unix",
        "skipRows": 0,
        "showProgress": True,
        "characterSet": "utf8mb4"
    })

As you can see, Load Data takes about 5 minutes, while import_table takes less than half the time to complete data import, which is more than twice as efficient (under the condition of limited disk IO capacity in the virtual machine environment)

The above is the details of MySQL Shell import_table data import details. For more information about import_table data import, please pay attention to other related articles on 123WORDPRESS.COM!

You may also be interested in:
  • Shell script to connect, read, write and operate MySQL database instance
  • Shell script to backup MySQL database data regularly and retain it for a specified time
  • Create MySQL database accounts on the server in batches through Shell scripts
  • Teach you how to connect to the database using MySQL Shell

<<:  Several ways to implement CSS height changing with width ratio

>>:  Detailed usage of docker-maven-plugin

Recommend

js to achieve the pop-up effect

This article example shares the specific code of ...

How to implement html input drop-down menu

Copy code The code is as follows: <html> &l...

WeChat applet calculator example

WeChat applet calculator example, for your refere...

In-depth understanding of uid and gid in docker containers

By default, processes in the container run with r...

Problem analysis of using idea to build springboot initializer server

Problem Description Recently, when I was building...

Summary of MySQL string interception related functions

This article introduces MySQL string interception...

MySQL 8.0.15 installation tutorial for Windows 64-bit

First go to the official website to download and ...

Introduction to NFS service construction under Centos7

Table of contents 1. Server 2. Client 3. Testing ...

How to simulate network packet loss and delay in Linux

netem and tc: netem is a network simulation modul...

Solve the cross-domain problem of get and post requests of vue $http

Vue $http get and post request cross-domain probl...

CSS3 realizes particle animation effect when matching kings

When coding, you will find that many things have ...

React.js framework Redux basic case detailed explanation

react.js framework Redux https://github.com/react...

Analysis of Mysql transaction characteristics and level principles

1. What is a transaction? A database transaction ...

Analysis on the problem of data loss caused by forced refresh of vuex

vuex-persistedstate Core principle: store all vue...