Detailed explanation of how to synchronize data from MySQL to Elasticsearch

Detailed explanation of how to synchronize data from MySQL to Elasticsearch

1. Synchronization Principle

Binlog log subscription based on MySQL: Binlog log is used by MySQL to record real-time changes in data

There are two types of MySQL data synchronization to ES: full synchronization and incremental synchronization

Full synchronization means that after the ES index is established for the first time, all the data in MySQL is imported into ES at once.

Incremental synchronization means that new data is generated in MySQL. These new data include three situations: newly inserted data in MySQL, updated old data, and deleted data. The changes and additions of these data must be synchronized to ES.

2. logstash-input-jdbc

The official logstash plug-in is integrated in logstash. Just download logstash and synchronize mysql and elasticsearch data through the configuration file.

advantage

  • It can realize the full and incremental data synchronization of MySQL data, and can realize scheduled synchronization
  • The version is updated quickly and is relatively stable
  • As part of ES inherent plug-in logstash, easy to use

shortcoming

  • Synchronous deletion cannot be achieved. After MySQL data is deleted, the data still exists in Elasticsearch
  • The shortest synchronization time difference is one minute, and data is synchronized once a minute. Real-time synchronization is not possible.

3. go-mysql-elasticsearch

go-mysql-elasticsearch is a plug-in developed by a domestic author

advantage

  • Can achieve full and incremental data synchronization of MySQL data

shortcoming

  • Unable to synchronize all data with Elasticsearch
  • Still in development, relatively unstable stage

4. elasticsearch-jdbc

The latest version of elasticsearch-jdbc is 2.3.4, and the supported ElasticSearch version is 2.3.4, which is not implemented yet.

advantage

  • Can achieve full and incremental data synchronization of MySQL data

shortcoming

  • The latest version is 2.3.4, and the supported ElasticSearch version is 2.3.4
  • Synchronous deletion cannot be achieved. After MySQL data is deleted, the data still exists in Elasticsearch

5. Logstash-input-jdbc synchronization

The first step is installation:

After logstash5.x, the logstash-input-jdbc plug-in is integrated. After installing logstash, install the logstash-input-jdbc plugin through the command

cd /logstash-6.4.2/bin
./logstash-plugin install logstash-input-jdbc

The second step configuration:

Create a new jdbc.conf in the logstash-6.4.2/config folder and configure it as follows

Create a new jdbc.sql file in the logstash-6.4.2/config directory

select * from t_employee

Step 3: Run

cd logstash-6.4.2
# Check whether the configuration file syntax is correct bin/logstash -f config/jdbc.conf --config.test_and_exit
# Start bin/logstash -f config/jdbc.conf --config.reload.automatic

--config.reload.automatic: Automatically reload the configuration file content

View the synchronized data after creating the index in kibana

PUT octopus
GET octopus/_search

6. go-mysql-elasticsearch synchronization

Step 1: mysql binlog log

go-mysql-elasticsearch adds, deletes, and modifies data in sync with elasticsearch through binlog logs in mysql

MySQL's binlog log is mainly used for master-slave replication and data recovery of the database. The binlog records the operations of adding, deleting, modifying and querying data. During the master-slave replication process, the master database synchronizes the binlog log to the slave database, and the slave database replays the events in the binlog log to achieve master-slave synchronization.

MySQL binlog has three modes:

ROW: records each row of data that has been modified, but the log volume is too large. STATEMENT: records each SQL statement that modifies the data, which reduces the log volume, but the SQL statement is prone to master-slave inconsistency when using functions or triggers. MIXED: combines the advantages of ROW and STATEMENT, and chooses to use ROW or STATEMENT to record logs according to the specific SQL statement that performs data operations.

To synchronize data to the ES cluster through MySQL binlog, you can only use ROW mode, because only ROW mode can know the modified content of the data in MySQL.

Taking the UPDATE operation as an example, the binlog log content in ROW mode is as follows:

SET TIMESTAMP=1527917394/*!*/;
    BEGIN
    /*!*/;
    # at 3751
    #180602 13:29:54 server id 1 end_log_pos 3819 CRC32 0x8dabdf01 Table_map: `webservice`.`building` mapped to number 74
    # at 3819
    #180602 13:29:54 server id 1 end_log_pos 3949 CRC32 0x59a8ed85 Update_rows: table id 74 flags: STMT_END_F
    
    BINLOG '
    UisSWxMBAAAARAAAAOsOAAAAAEoAAAAAAAEACndlYnNlcnZpY2UACGJ1aWxkaW5nAAYIDwEPEREG
    wACAAQAAAAHfq40=
    UisSWx8BAAAAggAAAG0PAAAAAEoAAAAAAAEAAgAG///A1gcAAAAAAAALYnVpbGRpbmctMTAADwB3
    UkRNbjNLYlV5d1k3ajVbD64WWw+uFsDWBwAAAAAAAAtidWlsZGluZy0xMAEPAHdSRE1uM0tiVXl3
    WTdqNVsPrhZbD64Whe2oWQ==
    '/*!*/;
    ### UPDATE `webservice`.`building`
    ###WHERE
    ### @1=2006 /* LONGINT meta=0 nullable=0 is_null=0 */
    ### @2='building-10' /* VARSTRING(192) meta=192 nullable=0 is_null=0 */
    ### @3=0 /* TINYINT meta=0 nullable=0 is_null=0 */
    ### @4='wRDMn3KbUywY7j5' /* VARSTRING(384) meta=384 nullable=0 is_null=0 */
    ### @5=1527754262 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */
    ### @6=1527754262 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */
    ### SET
    ### @1=2006 /* LONGINT meta=0 nullable=0 is_null=0 */
    ### @2='building-10' /* VARSTRING(192) meta=192 nullable=0 is_null=0 */
    ### @3=1 /* TINYINT meta=0 nullable=0 is_null=0 */
    ### @4='wRDMn3KbUywY7j5' /* VARSTRING(384) meta=384 nullable=0 is_null=0 */
    ### @5=1527754262 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */
    ### @6=1527754262 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */
    # at 3949
    #180602 13:29:54 server id 1 end_log_pos 3980 CRC32 0x58226b8f Xid = 182
    COMMIT /*!*/;

An example of binlog content in STATEMENT mode is as follows:

SET TIMESTAMP=1527919329/*!*/;
    update building set Status=1 where Id=2000
    /*!*/;
    # at 688
    #180602 14:02:09 server id 1 end_log_pos 719 CRC32 0x4c550a7d Xid = 200
    COMMIT /*!*/;

From the log contents of UPDATE operations in ROW mode and STATEMENT mode, we can see that ROW mode completely records the values ​​of all fields of a row of data to be modified before the update and the values ​​of all fields after the change, while STATEMENT mode only records the SQL statement of the UPDATE operation. If we want to synchronize MySQL data to ES in real time, we can only select binlog in ROW mode, obtain and parse the data content of the binlog log, execute the ES document API, and synchronize the data to the ES cluster.

View and modify binlog mode

# View binlog modemysql> show variables like "%binlog_format%";
 
# Modify binlog modemysql> set global binlog_format='ROW';
 
# Check whether binlog is enabled mysql> show variables like 'log_bin';
 
# Enable binlog
Modify the my.cnf file log-bin = mysql-bin

Step 2 Installation

# Install go
sudo apt-get install go
 
# Install godep
go get github.com/tools/godep
 
# Get the go-mysql-elasticsearch plugin go get github.com/siddontang/go-mysql-elasticsearch
 
# Install the go-mysql-elasticsearch plugin cd go/src/github.com/siddontang/go-mysql-elasticsearch
make

Step 3 Configuration

go/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml

# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "127.0.0.1:3306" # Basic mysql settings that need to be synchronized my_user = "root"
my_pass = "root"
 
# Elasticsearch address
es_addr = "127.0.0.1:9200" # Local elasticsearch configuration # Path to store data, like master.info, and dump MySQL data 
data_dir = "./var" # data storage url
# The following configuration is saved as default# Inner Http status address
stat_addr = "127.0.0.1:12800"
 
# pseudo server id like a slave 
server_id = 1001
 
#mysql or mariadb
flavor = "mysql"
#mysqldump execution path
mysqldump = "mysqldump"
 
#MySQL data source
[[source]]
schema = "test" //The corresponding database name when elasticsearch is synchronized with mysql # Only below tables will be synced into Elasticsearch.
# To synchronize several tables in the test database. For some projects that use the table partitioning mechanism, we can use wildcards to match, for example, t_[0-9]{4} can match table t_0000 to t_9999.
tables = ["t", "t_[0-9]{4}", "tfield", "tfilter"]  
 
# Below is for special rule mapping
# For a table, we need to specify the type of the ES index to which its data should be synchronized. If not specified, we will use schema # name as the ES index and type by default
[[rule]]
schema = "test" //Database nametable = "t" //Table nameindex = "test" //Corresponding index nametype = "t" //Corresponding type name# Synchronize all tables that meet the format of t_[0-9]{4} to the ES with index test and type t. Of course, these tables need to ensure that the # schema is consistent [[rule]]
schema = "test"
table = "t_[0-9]{4}"
index = "test"
type = "t"
 
# For table tfilter, we will only synchronize the id and name columns, and the others will not be synchronized filter = ["id", "name"]
# The column id of table tfield is mapped to es_id, and tags are mapped to es_tags
# list This field explicitly tells you that the corresponding column data needs to be converted into ES array type. This is now commonly used for MySQL's varchar # and other types. We may store data like "a, b, c", and then hope to convert it into a list format like [a, b, c] when synchronizing it to ES.
 
[rule.field]
# Map column `id` to ES field `es_id`
id="es_id"
# Map column `tags` to ES field `es_tags` with array type 
tags="es_tags,list"
# Map column `keywords` to ES with array type
keywords=",list"

Step 4: Run

cd go/src/github.com/siddontang/go-mysql-elasticsearch
bin/go-mysql-elasticsearch -config=./etc/river.toml

7. Elasticsearch-jdbc synchronization

Download the tool

Unzip: unzip elasticsearch-jdbc-2.3.2.0-dist.zip

Setting Environment Variables

[root@autofelix /]# vi /etc/profile
export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

Make environment variables effective

[root@autofelix /]# source /etc/profile

Configuration reference

Step 1: Create a new folder odbc_es in the root directory as follows

[root@autofelix /]# ll /odbc_es/
drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs
-rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh

Step 2: Create a new script mysql_import_es.sh with the following content

[root@autofelix odbc_es]# cat mysql_import_es.sh
'#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application", #Cluster name, see: /usr/local/elasticsearch/config/elasticsearch.yml
"url":"jdbc:mysql://10.8.5.101:3306/test", #mysql database address "user":"root", #mysql username "password":"123456", #mysql password "sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex", #new index
"type" : "mytype" #new type
}
}'| java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter

Step 3: Add executable permissions to mysql_import_es.sh.

[root@autofelix odbc_es]# chmod a+x mysql_import_es.sh

Step 4: Execute the script mysql_import_es.sh

[root@autofelix odbc_es]# ./mysql_import_es.sh

This is the end of this article on how to synchronize Mysql data to Elasticsearch. For more information about synchronizing Mysql data to Elasticsearch, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Tutorial on how to synchronize MySQL data to ElasticSearch using Python
  • Solution to the data asymmetry problem between MySQL and Elasticsearch

<<:  border-radius is a method for adding rounded borders to elements

>>:  Differences between Windows Server 2008R2, 2012, 2016, and 2019

Recommend

Detailed explanation of the buffer pool in MySQL

Everyone knows that data in MySQL needs to be wri...

Detailed explanation of custom events of Vue components

Table of contents Summarize <template> <...

Tips for designing photo preview navigation on web pages

<br />Navigation does not just refer to the ...

MySQL 8.0.11 Community Green Edition Installation Steps Diagram for Windows

In this tutorial, we use the latest MySQL communi...

Linux Operation and Maintenance Basic System Disk Management Tutorial

1. Disk partition: 2. fdisk partition If the disk...

Detailed installation and uninstallation tutorial for MySQL 8.0.12

1. Installation steps for MySQL 8.0.12 version. 1...

Realizing tree-shaped secondary tables based on angular

First look at the effect: Code: 1.html <div cl...

How to implement gzip compression in nginx to improve website speed

Table of contents Why use gzip compression? nginx...

Getting Started with Vue 3.0 Custom Directives

Table of contents 1. Custom instructions 1. Regis...

MySQL 5.7 installation and configuration method graphic tutorial

This tutorial shares the installation and configu...

How to run multiple MySQL instances in Windows

Preface In Windows, you can start multiple MySQL ...