5 ways to migrate from MySQL to ClickHouse

5 ways to migrate from MySQL to ClickHouse

Data migration needs to be imported from MySQL to ClickHouse. The summary plan is as follows, including three methods supported by ClickHouse itself and two third-party tools.

create table engin mysql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
 name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
 name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
 ...
 INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
 INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);

Official documentation: https://clickhouse.yandex/docs/en/operations/table_engines/mysql/

Note that the actual data is stored in a remote MySQL database, which can be understood as a foreign table.

You can verify this by adding and deleting data in MySQL.

insert into select from
-- Create a table first CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
 name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
 name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
 ...
) ENGINE = engine
-- Import data INSERT INTO [db.]table [(c1, c2, c3)] select column or * from mysql('host:port', 'db', 'table_name', 'user', 'password')

You can customize the column type and number of columns, and use clickhouse functions to process data, such as

select toDate(xx) from mysql("host:port","db","table_name","user_name","password")
create table as select from
CREATE TABLE [IF NOT EXISTS] [db.]table_name
ENGINE = Log
AS 
SELECT *
FROM mysql('host:port', 'db', 'article_clientuser_sum', 'user', 'password')

Netizen article: http://jackpgao.github.io/2018/02/04/ClickHouse-Use-MySQL-Data/

Custom columns are not supported, and the ENGIN=MergeTree test written by the blogger in the reference failed.

It can be understood as a combination of create table and insert into select

Altinity/clickhouse-mysql-data-reader

Altinity company open sourced a Python tool for migrating data from MySQL to ClickHouse (supports incremental updates of binlog and full import), but the official readme is out of sync with the code, and it does not work according to the quick start.

## Create table clickhouse-mysql \
 --src-host=127.0.0.1 \
 --src-user=reader \
 --src-password=Qwerty1# \
 --table-templates-with-create-database \
 --src-table=airline.ontime > create_clickhouse_table_template.sql
## Modify the script vim create_clickhouse_table_template.sql
## Import and create table clickhouse-client -mn < create_clickhouse_table_template.sql
## Import data into clickhouse-mysql \
 --src-host=127.0.0.1 \
 --src-user=reader \
 --src-password=Qwerty1# \
 --table-migrate \
 --dst-host=127.0.0.1 \
 --dst-table=logunified \
 --csvpool

Official documentation: https://github.com/Altinity/clickhouse-mysql-data-reader#mysql-migration-case-1—migrate-existing-data

Note that the above three methods are all imported from MySQL to ClickHouse. If the amount of data is large, it will put a lot of pressure on MySQL. The following are two offline methods (streamsets support both real-time and offline)

csv

## Ignore creating table clickhouse-client \
 -h host \
 --query="INSERT INTO [db].table FORMAT CSV" < test.csv

However, if the source data is of low quality, there will often be problems, such as special characters (delimiters, escape characters) or line breaks. I was badly cheated.

Custom delimiter, --format_csv_delimiter="|"
When an error occurs, skip it instead of aborting. --input_format_allow_errors_num=10 allows up to 10 rows of errors, --input_format_allow_errors_ratio=0.1 allows 10% errors. csv skips null values ​​and reports Code: 27. DB::Exception: Cannot parse input: expected , before: xxxx: (at row 69) ERROR: garbage after Nullable(Date): "8,002<LINE FEED>0205" sed ' :a;s/,,/,\\N,/g;ta' |clickhouse-client -h host --query "INSERT INTO [db].table FORMAT CSV" Replace ,, with ,\N,
python clean_csv.py --src=src.csv --dest=dest.csv --chunksize=50000 --cols --encoding=utf-8 --delimiter=,

clean_csv.py Refer to my other article 032-csv file fault tolerance processing

streamsets

Streamsets supports full import from MySQL or reading CSV, and also supports incremental insertion by subscribing to binlog. Please refer to my other article 025-Big Data ETL Tool StreamSets Installation and Subscription to MySQL Binlog.

This article only shows how to import clickhouse from mysql

This article assumes that you have already set up the streamsets service

Enable and restart the service

Upload the jdbc jar and dependency packages of mysql and clickhouse

Convenient way, create pom.xml and use maven to download it uniformly

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>com.anjia</groupId>
 <artifactId>demo</artifactId>
 <packaging>jar</packaging>
 <version>1.0-SNAPSHOT</version>
 <name>demo</name>
 <url>http://maven.apache.org</url>
 <dependencies>
 <dependency>
 <groupId>ru.yandex.clickhouse</groupId>
 <artifactId>clickhouse-jdbc</artifactId>
 <version>0.1.54</version>
 </dependency>
 <dependency>
 <groupId>mysql</groupId>
 <artifactId>mysql-connector-java</artifactId>
 <version>5.1.47</version>
 </dependency>
 </dependencies>
</project>

If Maven is installed locally, execute the following command

mvn dependency:copy-dependencies -DoutputDirectory=lib -DincludeScope=compile

All required jar files will be downloaded and copied to the lib directory.

Then copy it to streamsets /opt/streamsets-datacollector-3.9.1/streamsets-libs-extras/streamsets-datacollector-jdbc-lib/lib/ directory

Restart the streamsets service

       

Summarize

The above are the 5 methods of migrating MySQL to ClickHouse introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • Detailed steps for migrating the data folder of the MySQL database
  • Detailed explanation of how to migrate a MySQL database to another machine
  • Summary of methods for migrating Oracle database to MySQL
  • Migrate mysql database to Oracle database
  • MySQL database migration quickly exports and imports large amounts of data
  • How to use mysql Innodb tablespace unloading, migration, and loading
  • Detailed explanation of MySQL data migration--data directory direct replacement precautions
  • Detailed explanation of mysql5.5 database data directory migration method
  • MySQL backup and migration data synchronization method
  • A MySQL migration plan and practical record of pitfalls

<<:  How to deploy SpringBoot project using Docker

>>:  ElementUI implements the el-form form reset function button

Recommend

Practical record of handling MySQL automatic shutdown problems

I recently helped someone with a project and the ...

Docker image access to local elasticsearch port operation

Using the image service deployed by docker stack,...

Mini Program Custom TabBar Component Encapsulation

This article example shares the specific code for...

How to write high-quality JavaScript code

Table of contents 1. Easy to read code 1. Unified...

The difference between Display, Visibility, Opacity, rgba and z-index: -1 in CSS

We often need to control the hidden, transparent ...

Practical method of upgrading PHP to 5.6 in Linux

1: Check the PHP version after entering the termi...

The most complete 50 Mysql database query exercises

This database query statement is one of 50 databa...

Detailed explanation of NodeJS modularity

Table of contents 1. Introduction 2. Main text 2....

Vue implements online preview of PDF files (using pdf.js/iframe/embed)

Preface I am currently working on a high-quality ...

Detailed explanation of the principles of Vue's responsive system

Table of contents The basic principles of Vue'...

Detailed explanation of the usage of the alias command under Linux

1. Use of alias The alias command is used to set ...

Detailed explanation of MySQL database transaction isolation levels

Database transaction isolation level There are 4 ...