5 ways to migrate from MySQL to ClickHouse

5 ways to migrate from MySQL to ClickHouse

Data migration needs to be imported from MySQL to ClickHouse. The summary plan is as follows, including three methods supported by ClickHouse itself and two third-party tools.

create table engin mysql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
 name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
 name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
 ...
 INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,
 INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = MySQL('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);

Official documentation: https://clickhouse.yandex/docs/en/operations/table_engines/mysql/

Note that the actual data is stored in a remote MySQL database, which can be understood as a foreign table.

You can verify this by adding and deleting data in MySQL.

insert into select from
-- Create a table first CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
 name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
 name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
 ...
) ENGINE = engine
-- Import data INSERT INTO [db.]table [(c1, c2, c3)] select column or * from mysql('host:port', 'db', 'table_name', 'user', 'password')

You can customize the column type and number of columns, and use clickhouse functions to process data, such as

select toDate(xx) from mysql("host:port","db","table_name","user_name","password")
create table as select from
CREATE TABLE [IF NOT EXISTS] [db.]table_name
ENGINE = Log
AS 
SELECT *
FROM mysql('host:port', 'db', 'article_clientuser_sum', 'user', 'password')

Netizen article: http://jackpgao.github.io/2018/02/04/ClickHouse-Use-MySQL-Data/

Custom columns are not supported, and the ENGIN=MergeTree test written by the blogger in the reference failed.

It can be understood as a combination of create table and insert into select

Altinity/clickhouse-mysql-data-reader

Altinity company open sourced a Python tool for migrating data from MySQL to ClickHouse (supports incremental updates of binlog and full import), but the official readme is out of sync with the code, and it does not work according to the quick start.

## Create table clickhouse-mysql \
 --src-host=127.0.0.1 \
 --src-user=reader \
 --src-password=Qwerty1# \
 --table-templates-with-create-database \
 --src-table=airline.ontime > create_clickhouse_table_template.sql
## Modify the script vim create_clickhouse_table_template.sql
## Import and create table clickhouse-client -mn < create_clickhouse_table_template.sql
## Import data into clickhouse-mysql \
 --src-host=127.0.0.1 \
 --src-user=reader \
 --src-password=Qwerty1# \
 --table-migrate \
 --dst-host=127.0.0.1 \
 --dst-table=logunified \
 --csvpool

Official documentation: https://github.com/Altinity/clickhouse-mysql-data-reader#mysql-migration-case-1—migrate-existing-data

Note that the above three methods are all imported from MySQL to ClickHouse. If the amount of data is large, it will put a lot of pressure on MySQL. The following are two offline methods (streamsets support both real-time and offline)

csv

## Ignore creating table clickhouse-client \
 -h host \
 --query="INSERT INTO [db].table FORMAT CSV" < test.csv

However, if the source data is of low quality, there will often be problems, such as special characters (delimiters, escape characters) or line breaks. I was badly cheated.

Custom delimiter, --format_csv_delimiter="|"
When an error occurs, skip it instead of aborting. --input_format_allow_errors_num=10 allows up to 10 rows of errors, --input_format_allow_errors_ratio=0.1 allows 10% errors. csv skips null values ​​and reports Code: 27. DB::Exception: Cannot parse input: expected , before: xxxx: (at row 69) ERROR: garbage after Nullable(Date): "8,002<LINE FEED>0205" sed ' :a;s/,,/,\\N,/g;ta' |clickhouse-client -h host --query "INSERT INTO [db].table FORMAT CSV" Replace ,, with ,\N,
python clean_csv.py --src=src.csv --dest=dest.csv --chunksize=50000 --cols --encoding=utf-8 --delimiter=,

clean_csv.py Refer to my other article 032-csv file fault tolerance processing

streamsets

Streamsets supports full import from MySQL or reading CSV, and also supports incremental insertion by subscribing to binlog. Please refer to my other article 025-Big Data ETL Tool StreamSets Installation and Subscription to MySQL Binlog.

This article only shows how to import clickhouse from mysql

This article assumes that you have already set up the streamsets service

Enable and restart the service

Upload the jdbc jar and dependency packages of mysql and clickhouse

Convenient way, create pom.xml and use maven to download it uniformly

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>com.anjia</groupId>
 <artifactId>demo</artifactId>
 <packaging>jar</packaging>
 <version>1.0-SNAPSHOT</version>
 <name>demo</name>
 <url>http://maven.apache.org</url>
 <dependencies>
 <dependency>
 <groupId>ru.yandex.clickhouse</groupId>
 <artifactId>clickhouse-jdbc</artifactId>
 <version>0.1.54</version>
 </dependency>
 <dependency>
 <groupId>mysql</groupId>
 <artifactId>mysql-connector-java</artifactId>
 <version>5.1.47</version>
 </dependency>
 </dependencies>
</project>

If Maven is installed locally, execute the following command

mvn dependency:copy-dependencies -DoutputDirectory=lib -DincludeScope=compile

All required jar files will be downloaded and copied to the lib directory.

Then copy it to streamsets /opt/streamsets-datacollector-3.9.1/streamsets-libs-extras/streamsets-datacollector-jdbc-lib/lib/ directory

Restart the streamsets service

       

Summarize

The above are the 5 methods of migrating MySQL to ClickHouse introduced by the editor. I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • Detailed steps for migrating the data folder of the MySQL database
  • Detailed explanation of how to migrate a MySQL database to another machine
  • Summary of methods for migrating Oracle database to MySQL
  • Migrate mysql database to Oracle database
  • MySQL database migration quickly exports and imports large amounts of data
  • How to use mysql Innodb tablespace unloading, migration, and loading
  • Detailed explanation of MySQL data migration--data directory direct replacement precautions
  • Detailed explanation of mysql5.5 database data directory migration method
  • MySQL backup and migration data synchronization method
  • A MySQL migration plan and practical record of pitfalls

<<:  How to deploy SpringBoot project using Docker

>>:  ElementUI implements the el-form form reset function button

Recommend

Detailed explanation of CSS margin overlap and solution exploration

I recently reviewed some CSS-related knowledge po...

CSS -webkit-box-orient: vertical property lost after compilation

1. Cause The requirement is to display two lines,...

Implementation of Docker private warehouse registry deployment

As more and more Docker images are used, there ne...

Solution to prevent caching in pages

Solution: Add the following code in <head>: ...

How to uninstall MySQL 8.0 version under Linux

1. Shut down MySQL [root@localhost /]# service my...

Example of how to create a local user in mysql and grant database permissions

Preface When you install MySQL, you usually creat...

mysql5.7 remote access settings

Setting up remote access in mysql5.7 is not like ...

How to use Maxwell to synchronize MySQL data in real time

Table of contents About Maxwell Configuration and...

Summary of Mysql high performance optimization skills

Database Command Specification All database objec...

MySQL query statement grouped by time

MySQL query by year, month, week, day group 1. Qu...

JavaScript to implement simple tab bar switching content bar

This article shares the specific code of JavaScri...

Detailed explanation of react setState

Table of contents Is setState synchronous or asyn...

Detailed explanation of JavaScript array deduplication

Table of contents 1. Array deduplication 2. Dedup...