Written in frontI've found some time to update my experience with big data. When I was initially selecting the architecture, I considered using Hadoop as a data warehouse. However, Hadoop requires a high number of servers, with a cluster of at least 6 or more servers, so I chose Clickhouse (hereinafter referred to as CH). If CH is clustered, you can start with 3 servers. Of course, it is not mandatory. It depends on whether your zookeeper is clustered. Secondly, CH has more powerful performance. For scenarios where the volume is not very large, a single machine is enough to handle various OLAP scenarios. Let’s get to the point. Related environment:
Install docker on each service. There are three services installed in docker: ch-main, ch-sub, and zookeeper_node, as shown in the figure: Careful observers have noticed that there is no mapping relationship between PORTS. The Docker host network mode is used here. The mode is simple and has high performance, which avoids many communication problems between containers or across servers. This has been a long time. Environment deployment1. Server environment configuration Execute on each server: vim /etc/hosts, open hosts and add the following configuration: 172.192.13.10 server01 172.192.13.11 server02 172.192.13.12 server03 2. Install Docker Too simple, slightly... 3. Pull clickhouse and zookeeper images Too simple, slightly... Zookeeper Cluster DeploymentCreate a new folder to store the zookeeper configuration information at the location you want to store it on each server, here is /usr/soft/zookeeper/, and run the following startup commands on each server in turn: Server01 executes: docker run -d -p 2181:2181 -p 2888:2888 -p 3888:3888 --name zookeeper_node --restart always \ -v /usr/soft/zookeeper/data:/data \ -v /usr/soft/zookeeper/datalog:/datalog \ -v /usr/soft/zookeeper/logs:/logs \ -v /usr/soft/zookeeper/conf:/conf \ --network host \ -e ZOO_MY_ID=1 zookeeper Server02 executes: docker run -d -p 2181:2181 -p 2888:2888 -p 3888:3888 --name zookeeper_node --restart always \ -v /usr/soft/zookeeper/data:/data \ -v /usr/soft/zookeeper/datalog:/datalog \ -v /usr/soft/zookeeper/logs:/logs \ -v /usr/soft/zookeeper/conf:/conf \ --network host \ -e ZOO_MY_ID=2 zookeeper Server03 executes: docker run -d -p 2181:2181 -p 2888:2888 -p 3888:3888 --name zookeeper_node --restart always \ -v /usr/soft/zookeeper/data:/data \ -v /usr/soft/zookeeper/datalog:/datalog \ -v /usr/soft/zookeeper/logs:/logs \ -v /usr/soft/zookeeper/conf:/conf \ --network host \ -e ZOO_MY_ID=3 zookeeper The only difference is: -e ZOO_MY_ID=*. Secondly, open the /usr/soft/zookeeper/conf path on each service, find the zoo.cfg configuration file, and modify it to: dataDir=/data dataLogDir=/datalog tickTime=2000 initLimit=5 syncLimit=2 clientPort=2181 autopurge.snapRetainCount=3 autopurge.purgeInterval=0 maxClientCnxns=60 server.1=172.192.13.10:2888:3888 server.2=172.192.13.11:2888:3888 server.3=172.192.13.12:2888:3888 Then enter one of the servers and enter zk to check whether the configuration is successful: docker exec -it zookeeper_node /bin/bash ./bin/zkServer.sh status Clickhouse cluster deployment1. Temporary mirror copy configurationRun a temporary container to store configuration, data, logs and other information on the host: docker run --rm -d --name=temp-ch yandex/clickhouse-server Copy the files in the container: docker cp temp-ch:/etc/clickhouse-server/ /etc/ //https://www.cnblogs.com/EminemJK/p/15138536.html 2. Modify config.xml configuration//Also compatible with IPV6, once and for all <listen_host>0.0.0.0</listen_host> //Set the time zone <timezone>Asia/Shanghai</timezone> //Delete the test information of the original node <remote_servers> <remote_servers incl="clickhouse_remote_servers" /> //Newly added, at the same level as the remote_servers node above <include_from>/etc/clickhouse-server/metrika.xml</include_from> //Newly added, at the same level as the remote_servers node above <zookeeper incl="zookeeper-servers" optional="true" /> //Newly added, at the same level as the remote_servers node above <macros incl="macros" optional="true" /> For other listen_host, just keep one entry and comment out the others. 3. Copy to another foldercp -rf /etc/clickhouse-server/ /usr/soft/clickhouse-server/main cp -rf /etc/clickhouse-server/ /usr/soft/clickhouse-server/sub main is the primary shard and sub is the replica. 4. Distribute to other servers#Copy the configuration to server02 scp /usr/soft/clickhouse-server/main/ server02:/usr/soft/clickhouse-server/main/ scp /usr/soft/clickhouse-server/sub/ server02:/usr/soft/clickhouse-server/sub/ #Copy the configuration to server03 scp /usr/soft/clickhouse-server/main/ server03:/usr/soft/clickhouse-server/main/ scp /usr/soft/clickhouse-server/sub/ server03:/usr/soft/clickhouse-server/sub/ SCP is really good. Then you can delete the temporary container: docker rm -f temp-ch Configuring the clusterThere are three servers here, and each server has two CH instances, which back up each other in a ring to achieve high availability. When resources are sufficient, the replica Sub instance can be completely independent and the configuration can be modified. This is another advantage of Clickhouse, and horizontal expansion is very convenient. 1. Modify the configurationEnter the server1 server, modify the config.xml file in /usr/soft/clickhouse-server/sub/conf, and modify the following contents: Original: <http_port>8123</http_port> <tcp_port>9000</tcp_port> <mysql_port>9004</mysql_port> <interserver_http_port>9009</interserver_http_port> Modified to: <http_port>8124</http_port> <tcp_port>9001</tcp_port> <mysql_port>9005</mysql_port> <interserver_http_port>9010</interserver_http_port> The purpose of the modification is to distinguish it from the configuration of the main shard. The port cannot be applied to two programs at the same time. Server02 and server03 are modified in this way or distributed using the scp command. 2. Add cluster configuration file metrika.xmlserver01, main primary shard configuration: Go to the /usr/soft/clickhouse-server/main/conf folder and add the metrika.xml file (file encoding: utf-8). <yandex> <!-- CH cluster configuration, all servers are the same--> <clickhouse_remote_servers> <cluster_3s_1r> <!-- Data Shard 1 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server01</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server03</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- Data Shard 2 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server02</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server01</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> <!-- Data Shard 3 --> <shard> <internal_replication>true</internal_replication> <replica> <host>server03</host> <port>9000</port> <user>default</user> <password></password> </replica> <replica> <host>server02</host> <port>9001</port> <user>default</user> <password></password> </replica> </shard> </cluster_3s_1r> </clickhouse_remote_servers> <!-- All zookeeper_servers instances have the same configuration --> <zookeeper-servers> <node index="1"> <host>172.16.13.10</host> <port>2181</port> </node> <node index="2"> <host>172.16.13.11</host> <port>2181</port> </node> <node index="3"> <host>172.16.13.12</host> <port>2181</port> </node> </zookeeper-servers> <!-- Each instance of marcos has a different configuration --> <macros> <layer>01</layer> <shard>01</shard> <replica>cluster01-01-1</replica> </macros> <networks> <ip>::/0</ip> </networks> <!-- Data compression algorithm--> <clickhouse_compression> <case> <min_part_size>10000000000</min_part_size> <min_part_size_ratio>0.01</min_part_size_ratio> <method>lz4</method> </case> </clickhouse_compression> </yandex> The <macros> node is different for each server and each instance, and the configurations of other nodes can be the same. The following only lists the configuration differences of the <macros> node. Server01, sub replica configuration: <macros> <layer>01</layer> <shard>02</shard> <replica>cluster01-02-2</replica> </macros> Server02, main primary shard configuration: <macros> <layer>01</layer> <shard>02</shard> <replica>cluster01-02-1</replica> </macros> Server02, sub replica configuration: <macros> <layer>01</layer> <shard>03</shard> <replica>cluster01-03-2</replica> </macros> Server03, main primary shard configuration: <macros> <layer>01</layer> <shard>03</shard> <replica>cluster01-03-1</replica> </macros> Server03, sub replica configuration: <macros> <layer>01</layer> <shard>02</shard> <replica>cluster01-01-2</replica> </macros> At this point, all configurations have been completed. Other configurations, such as passwords, can be added as needed. Cluster operation and testingRun the instance on each server in turn. Zookeeper has been running before. If not, you need to run the zk cluster first. Run the main instance: docker run -d --name=ch-main -p 8123:8123 -p 9000:9000 -p 9009:9009 --ulimit nofile=262144:262144 \ -v /usr/soft/clickhouse-server/main/data:/var/lib/clickhouse:rw \ -v /usr/soft/clickhouse-server/main/conf:/etc/clickhouse-server:rw \ -v /usr/soft/clickhouse-server/main/log:/var/log/clickhouse-server:rw \ --add-host server01:172.192.13.10 \ --add-host server02:172.192.13.11 \ --add-host server03:172.192.13.12 \ --hostname server01 \ --network host \ --restart=always \ yandex/clickhouse-server Run the sub instance: docker run -d --name=ch-sub -p 8124:8124 -p 9001:9001 -p 9010:9010 --ulimit nofile=262144:262144 \ -v /usr/soft/clickhouse-server/sub/data:/var/lib/clickhouse:rw \ -v /usr/soft/clickhouse-server/sub/conf:/etc/clickhouse-server:rw \ -v /usr/soft/clickhouse-server/sub/log:/var/log/clickhouse-server:rw \ --add-host server01:172.192.13.10 \ --add-host server02:172.192.13.11 \ --add-host server03:172.192.13.12 \ --hostname server01 \ --network host \ --restart=always \ yandex/clickhouse-server When executing the command on each server, the only different parameter is the hostname, because we have previously set the hostname to specify the server. Otherwise, when executing select * from system.clusters to query the cluster, the is_local column will be all 0, indicating that the local service cannot be found. This is something that needs attention. After each server instance is started, use the genuine DataGrip to open it: Create a new query on any instance: create table T_UserTest on cluster cluster_3s_1r ( ts DateTime, uid String, biz String ) engine = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/T_UserTest', '{replica}') PARTITION BY toYYYYMMDD(ts) ORDER BY ts SETTINGS index_granularity = 8192; cluster_3s_1r is the name of the cluster configured earlier. They must correspond one to one. /clickhouse/tables/ is a fixed prefix. For related syntax, see the official documentation. Refresh each instance and you can see that all instances have this T_UserTest table. Because Zookeeper has been set up, it is easy to implement distributed DDL. Continue to create a new Distributed table: CREATE TABLE T_UserTest_All ON CLUSTER cluster_3s_1r AS T_UserTest ENGINE = Distributed(cluster_3s_1r, default, T_UserTest, rand()) Each primary shard inserts relevant information separately: --server01insert into T_UserTest values ('2021-08-16 17:00:00',1,1) --server02 insert into T_UserTest values ('2021-08-16 17:00:00',2,1) --server03 insert into T_UserTest values ('2021-08-16 17:00:00',3,1) Then query the distributed table select * from T_UserTest_All, Querying the corresponding replica table or shutting down the docker instance of one of the servers will not affect the query. This is not tested due to time constraints. This is the end of this article about Clickhouse Docker cluster configuration and deployment. For more related Clickhouse Docker cluster content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
>>: Using js to implement simple switch light code
Table of contents Drag and drop implementation Dr...
"/" is the root directory, and "~&...
IDEA is the most commonly used development tool f...
Icon icon processing solution The goal of this re...
When it comes to understanding web design, many p...
Table of contents 1. Background 2. Understanding ...
This article example shares the specific code of ...
This article uses an example to describe how to i...
Date-type single-row functions in MySQL: CURDATE(...
Table of contents Preface 1. Technical Principle ...
When a user registers, they will click on a label...
Preface This article mainly introduces the releva...
This article uses an example to illustrate the us...
Table of contents 1. concat() 2. join() 3. push()...
This article shares the specific code of react to...