The most convenient way to build a Zookeeper server in history (recommended)

The most convenient way to build a Zookeeper server in history (recommended)

What is ZooKeeper

ZooKeeper is a top-level project of Apache that provides efficient and highly available distributed coordination services for distributed applications. It provides distributed basic services such as data publishing/subscription, load balancing, naming services, distributed coordination/notification, and distributed locks. Due to its convenient usage, excellent performance and good stability, ZooKeeper is widely used in large distributed systems such as Hadoop, HBase, Kafka and Dubbo.

Zookeeper has three operating modes: stand-alone mode, pseudo-cluster mode, and cluster mode.

  • Stand-alone mode: This mode is generally suitable for development and testing environments. On the one hand, we do not have so many machine resources, and on the other hand, daily development and debugging do not require extremely good stability.
  • Cluster mode: A ZooKeeper cluster is usually composed of a group of machines. Generally, more than 3 machines can form an available ZooKeeper cluster. Each machine that makes up the ZooKeeper cluster maintains the current server state in memory, and each machine maintains communication with each other.
  • Pseudo cluster mode: This is a special cluster mode, that is, all servers in the cluster are deployed on one machine. When you have a good machine at hand, if you deploy it in stand-alone mode, it will waste resources. In this case, ZooKeeper allows you to start multiple ZooKeeper service instances on one machine by starting different ports, so as to provide external services with cluster characteristics.

ZooKeeper related knowledge

  • Leader in Zookeeper: responsible for initiating and resolving votes and updating system status
  • Follower: Used to receive client requests and return results to the client, and vote in the leader election process
  • Observer: It can accept client connections and forward write requests to the leader, but the observer does not participate in the voting process. It is only used to expand the system and increase the reading speed.

Zookeeper Data Model

  • Hierarchical directory structure, named in accordance with the conventional file system specifications, similar to Linux
  • Each node in Zookeeper is called a znode and has a unique path identifier.
  • Node Znode can contain data and child nodes, but EPHEMERAL type nodes cannot have child nodes
  • The data in a Znode can have multiple versions. For example, if there are multiple data versions under a certain path, then the version number is required to query the data under this path.
  • Client applications can set up monitors on nodes
  • The node does not support partial read and write, but a one-time full read and write

ZooKeeper Node Features

ZooKeeper nodes have a lifecycle, which depends on the type of node. In ZooKeeper, nodes can be divided into persistent nodes (PERSISTENT) and temporary nodes (EPHEMERAL) according to their duration, and can be divided into sequential nodes (SEQUENTIAL) and unordered nodes (unordered by default) according to whether they are ordered.

Once a persistent node is created, it will remain in Zookeeper unless it is removed (it will not disappear due to the invalidation of the session of the client that created the node).

Application Scenarios of Zookeeper

ZooKeeper is a highly available distributed data management and system coordination framework. Based on the implementation of the Paxos algorithm, the framework ensures strong consistency of data in a distributed environment. It is also based on this feature that ZooKeeper can solve many distributed problems.

It is worth noting that ZooKeeper was not designed for these application scenarios. Instead, many developers later explored typical usage methods based on the characteristics of the framework and the series of API interfaces (or primitive sets) it provides.

Data publishing and subscription (configuration center)

The publish and subscribe model, also known as the configuration center, means that the publisher publishes data to the ZK node for the subscriber to dynamically obtain data, thus realizing the centralized management and dynamic update of configuration information. For example, global configuration information, service address lists of service-based service frameworks, etc. are very suitable for use.

Some configuration information used in the application is placed on ZK for centralized management. This type of scenario is usually like this: the application will actively obtain the configuration once when it starts, and at the same time, register a Watcher on the node. In this way, every time the configuration is updated, the subscribed client will be notified in real time, so as to achieve the purpose of obtaining the latest configuration information. In the distributed search service, the index metadata and the node status of the server cluster machines are stored in some designated nodes of ZK for subscription by various clients. Distributed log collection system. The core work of this system is to collect logs distributed on different machines. The collector usually allocates collection task units according to the application. Therefore, it is necessary to create a node P on ZK with the application name as the path, and register all the machine IPs of this application on the node P as child nodes. In this way, when the machine changes, the collector can be notified in real time to adjust the task allocation. Some information in the system needs to be obtained dynamically, and there will also be the problem of manually modifying this information. Usually an interface is exposed, such as a JMX interface, to obtain some runtime information. After introducing ZK, you no longer need to implement a solution yourself, just store this information on the specified ZK node. Note: In the application scenarios mentioned above, there is a default premise: the data volume is small, but the data update may be relatively fast.

Load Balancing

The load balancing mentioned here refers to soft load balancing. In a distributed environment, in order to ensure high availability, the provider of the same application or service usually deploys multiple copies to achieve peer-to-peer service. Consumers need to choose one of these peer servers to execute related business logic, among which the more typical one is the producer and consumer load balancing in the message middleware.

Naming Service

Naming service is also a common scenario in distributed systems. In a distributed system, by using a naming service, client applications can obtain the address, provider, and other information of a resource or service based on the specified name. The named entities can usually be machines in a cluster, service addresses provided, remote objects, etc. - we can collectively refer to them as names. Among them, the more common ones are the service address lists in some distributed service frameworks. By calling the node creation API provided by ZK, you can easily create a globally unique path, which can be used as a name.

Alibaba Group's open source distributed service framework Dubbo uses ZooKeeper as its naming service to maintain a global service address list. In the Dubbo implementation: When the service provider starts, it writes its URL address to the specified node /dubbo/${serviceName}/providers directory on ZK. This operation completes the service release. When the service consumer starts, it subscribes to the provider URL address in the /dubbo/${serviceName}/providers directory and writes its own URL address to the /dubbo/${serviceName} /consumers directory. Note that all addresses registered with ZK are temporary nodes, which ensures that service providers and consumers can automatically sense resource changes.

In addition, Dubbo also provides service-level monitoring by subscribing to the information of all providers and consumers in the /dubbo/${serviceName} directory.

Distributed Notification/Coordination

ZooKeeper's unique watcher registration and asynchronous notification mechanism can effectively implement notification and coordination between different systems in a distributed environment, and realize real-time processing of data changes. The usage method is usually that different systems register the same znode on ZK and monitor the changes of the znode (including the content of the znode itself and its child nodes). If one system updates the znode, the other system can receive the notification and make corresponding processing.

Another heartbeat detection mechanism: the detection system and the detected system are not directly linked, but are linked through a node on zk, which greatly reduces system coupling. Another system scheduling mode: A system consists of two parts: a console and a push system. The console is responsible for controlling the push system to perform corresponding push work. Some operations performed by managers in the console actually modify the status of certain nodes on ZK, and ZK notifies the clients that register their Watchers, that is, the push system, of these changes, and then performs corresponding push tasks.

Another work reporting mode: some similar to the task distribution system, after the subtask is started, it goes to zk to register a temporary node, and reports its progress regularly (writes the progress back to this temporary node), so that the task manager can know the task progress in real time.

Distributed Locks

Distributed locks are mainly due to the strong consistency of data guaranteed by ZooKeeper. Lock services can be divided into two categories, one is to maintain exclusivity, and the other is to control timing.

The so-called exclusive ownership means that among all the clients trying to obtain the lock, only one can successfully obtain the lock. The usual practice is to regard a znode on zk as a lock and implement it by create znode . All clients create a /distribute_lock node, and the client that successfully creates the node owns the lock. Controlling the timing means that all clients that try to obtain the lock will eventually be scheduled for execution, but there is a global timing. The approach is similar to the above, except that /distribute_lock already exists, and the client creates a temporary ordered node under it (this can be controlled by the node's properties: CreateMode.EPHEMERAL_SEQUENTIAL to specify). Zk's parent node ( /distribute_lock ) maintains a sequence to ensure the timing of child node creation, thereby forming the global timing of each client.

Since the names of subnodes under the same node cannot be the same, as long as a znode is created under a certain node, successful creation means that the lock is successful. Register a listener to monitor this znode, and notify other clients to lock it as soon as this znode is deleted. Create temporary sequential nodes: Create a node under a certain node. When a request comes, a node is created. Since it is sequential, the one with the smallest sequence number obtains the lock. When the lock is released, the next sequence number is notified to obtain the lock.

Distributed Queues

In terms of queues, there are two types in simple terms. One is the regular first-in-first-out queue, and the other is to wait until all the queue members gather together before executing in order. The basic principle of the first type of queue is the same as that of the timing control scenario in the distributed lock service mentioned above, so I will not go into details here.

The second type of queue is actually an enhancement based on the FIFO queue. Usually, you can pre-create a /queue/num node under the /queue znode and assign a value of n (or directly assign n to /queue) to indicate the queue size. After that, each time a queue member joins, it is determined whether the queue size has been reached to decide whether execution can begin. A typical scenario for this usage is that in a distributed environment, a large task, Task A, can only be performed after many subtasks are completed (or conditions are ready). At this time, whenever one of the subtasks is completed (ready), it will create its own temporary sequence node ( CreateMode.EPHEMERAL_SEQUENTIAL ) under /taskList . When /taskList finds that the number of child nodes under it meets the specified number, it can proceed to the next step and process them in sequence.

Use dokcer-compose to build a cluster

We have introduced so many application scenarios of ZooKeeper above. Next, we will first learn how to build a ZooKeeper cluster and then practice the above application scenarios.

The directory structure of the file is as follows:

├── docker-compose.yml

Write the docker-compose.yml file

The contents of the docker-compose.yml file are as follows:

version: '3.4'
services:
 zoo1:
 image: zookeeper
 restart: always
 hostname: zoo1
 ports:
  - 2181:2181
 environment:
  ZOO_MY_ID: 1
  ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
 zoo2:
 image: zookeeper
 restart: always
 hostname: zoo2
 ports:
  - 2182:2181
 environment:
  ZOO_MY_ID: 2
  ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181
 zoo3:
 image: zookeeper
 restart: always
 hostname: zoo3
 ports:
  - 2183:2181
 environment:
  ZOO_MY_ID: 3
  ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181

In this configuration file, Docker runs three zookeeper images and binds the local ports 2181, 2182, and 2183 to the corresponding container's port 2181 through the ports field.

ZOO_MY_ID and ZOO_SERVERS are two environment variables required to build a Zookeeper cluster. ZOO_MY_ID identifies the service ID, which is an integer between 1 and 255 and must be unique in the cluster. ZOO_SERVERS is a list of hosts in the cluster.

Execute docker-compose up in the directory where docker-compose.yml is located to view the startup log.

Connecting to ZooKeeper

After starting the cluster, we can connect to ZooKeeper to perform node-related operations on it.

First we need to download ZooKeeper. ZooKeeper download address. Unzip it into its conf directory and change zoo_sample .cfg zoo.cfg

Configuration file description

# The number of milliseconds of each tick
# tickTime: CS communication heartbeat number # The time interval for maintaining heartbeats between Zookeeper servers or between clients and servers, that is, a heartbeat is sent every tickTime. tickTime is in milliseconds.
tickTime=2000

# The number of ticks that the initial
# synchronization phase can take
# initLimit: LF initial communication time limit # The maximum number of heartbeats (number of tickTimes) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster.
initLimit=5

# The number of ticks that can pass between
# sending a request and getting an acknowledgement
# syncLimit: LF synchronization communication time limit # The maximum number of heartbeats (number of tickTimes) that can be tolerated between requests and responses between the follower server and the leader server in the cluster.
syncLimit=2

# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
# dataDir: data file directory #The directory where Zookeeper saves data. By default, Zookeeper also saves the log files for writing data in this directory.
dataDir=/data/soft/zookeeper-3.4.12/data


# dataLogDir: Log file directory #The directory where Zookeeper saves log files.
dataLogDir=/data/soft/zookeeper-3.4.12/logs

# the port at which the clients will connect
# clientPort: Client connection port# The port through which the client connects to the Zookeeper server. Zookeeper will listen to this port and accept access requests from the client.
clientPort=2181

# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1


# Server name and address: cluster information (server number, server address, LF communication port, election port)
# The writing format of this configuration item is special, the rules are as follows:

# server.N=YYY:A:B

# Where N is the server number, YYY is the server IP address, and A is the LF communication port, which indicates the port for information exchange between the server and the leader in the cluster. B is the election port, which indicates the port through which servers communicate with each other when electing a new leader (when the leader fails, the remaining servers will communicate with each other to select a new leader). Generally speaking, the A port of each server in the cluster is the same, and the B port of each server is also the same. However, when a pseudo cluster is used, the IP addresses are the same, only the A port and the B port are different.

You don't need to modify zoo.cfg, the default configuration is fine. Next, execute the command ./zkCli.sh -server 127.0.0.1:2181 in the unzipped bin directory to connect.

Welcome to ZooKeeper!
2020-06-01 15:03:52,512 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1025] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2020-06-01 15:03:52,576 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@879] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2020-06-01 15:03:52,599 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100001140080000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2181(CONNECTED) 0]

Next we can use the command to view the node

Use the ls command to view the contents of the current ZooKeeper

Command: ls /

[zk: 127.0.0.1:2181(CONNECTED) 10] ls /

[zookeeper] ```

A new znode "zk" is created and a string is associated with it.

Command: create /zk myData

[zk: 127.0.0.1:2181(CONNECTED) 11] create /zk myData

Created /zk [zk: 127.0.0.1:2181(CONNECTED) 12] ls / [zk, zookeeper] [zk: 127.0.0.1:2181(CONNECTED) 13] ```

Get znode node zk

Command: get /zk

[zk: 127.0.0.1:2181(CONNECTED) 13] get /zk

myData cZxid = 0x400000008 ctime = Mon Jun 01 15:07:50 CST 2020 mZxid = 0x400000008 mtime = Mon Jun 01 15:07:50 CST 2020 pZxid = 0x400000008 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 6 numChildren = 0

```

Delete znode node zk

Command: delete /zk

[zk: 127.0.0.1:2181(CONNECTED) 14] delete /zk

[zk: 127.0.0.1:2181(CONNECTED) 15] ls / [zookeeper] ```

Due to limited space, the next article will implement the ZooKeeper application scenarios mentioned above one by one with code.

Where to store ZooKeeper's Docker configuration files

Where to store ZooKeeper's Docker configuration files

Where to store ZooKeeper's Docker configuration files

You can directly pull the project from above. It only takes two steps to start RocketMQ.

Pull the project from GitHub and execute the docker-compose up command in the ZooKeeper folder.

References

http://www.jucaiylzc.cn /2011/10/08/1232/
http://www.dongdongrji.cn /2019/04/25/1_Zookeeper%E8%AF%A6%E8%A7%A3/
https://www.jintianxuesha.com/cyfonly/p/5626532.html
http://www.hengxuangyul.com .com/docker-zookeeper-cluster/
https://www.qiaoheibpt.com maizitoday.github.io/post/zookeeper%E5%85%A5%E9%97%A8/

Summarize

This is the end of this article about the most convenient way to build a Zookeeper server in history. For more relevant content about Zookeeper server construction, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • A brief analysis of the working principle of ZooKeeper
  • Java ZooKeeper distributed lock implementation diagram
  • Detailed explanation of distributed lock based on Zookeeper
  • ZooKeeper distributed coordination service design core concepts and installation configuration

<<:  Detailed explanation of MySQL Strict Mode knowledge points

>>:  Simple implementation method of two-way data binding in js project

Recommend

18 sets of exquisite Apple-style free icon materials to share

Apple Mug Icons and Extras HD StorageBox – add on...

Install centos7 virtual machine on win10

1. Download VMware Workstation 64 version https:/...

Problems and solutions when replacing Oracle with MySQL

Table of contents Migration Tools Application tra...

Using System.Drawing.Common in Linux/Docker

Preface After the project is migrated to .net cor...

Detailed summary of mysql sql statements to create tables

mysql create table sql statement Common SQL state...

Conflict resolution when marquee and flash coexist in a page

The main symptom of the conflict is that the FLASH...

Specific method to delete mysql service

MySQL prompts the following error I went to "...

Use native js to simulate the scrolling effect of live bullet screen

Table of contents 1. Basic principles 2. Specific...

Native JS to implement click number game

Native JS implements the click number game for yo...

An article to solve the echarts map carousel highlight

Table of contents Preface toDoList just do it Pre...

How to use limit_req_zone in Nginx to limit the access to the same IP

Nginx can use the limit_req_zone directive of the...

Docker Gitlab+Jenkins+Harbor builds a persistent platform operation

CI/CD Overview CI workflow design Git code versio...

How to enable Swoole Loader extension on Linux system virtual host

Special note: Only the Swoole extension is instal...