Tutorial diagram of building a Hadoop high-availability cluster based on ZooKeeper

Tutorial diagram of building a Hadoop high-availability cluster based on ZooKeeper

1. Introduction to High Availability

Hadoop High Availability is divided into HDFS High Availability and YARN High Availability. The implementation of the two is basically similar, but the HDFS NameNode has much higher requirements for data storage and consistency than the YARN ResourceManager, so its implementation is also more complicated, so we will explain it first:

1.1 Highly available overall architecture

The HDFS high availability architecture is as follows:

Image source: https://www.edureka.co/blog/how-to-set-up-hadoop-cluster-with-hdfs-high-availability/

The HDFS high availability architecture is mainly composed of the following components:

Active NameNode and Standby NameNode: Two NameNodes form a mutual backup, one is in Active state and is the primary NameNode, and the other is in Standby state and is the backup NameNode. Only the primary NameNode can provide read and write services to the outside world.

  • Active/standby switching controller ZKFailoverController: ZKFailoverController runs as an independent process and performs overall control over the active/standby switching of NameNode. ZKFailoverController can detect the health status of NameNode in time, and use Zookeeper to implement automatic master-slave election and switching when the master NameNode fails. Of course, NameNode currently also supports manual master-slave switching that does not depend on Zookeeper.
  • Zookeeper cluster: Provides support for master-slave election for the master-slave switch controller. Shared storage system: The shared storage system is the most critical part to achieve high availability of NameNode. The shared storage system stores the HDFS metadata generated by NameNode during operation.
  • The primary NameNode and NameNode synchronize metadata through a shared storage system. When performing a master-slave switch, the new master NameNode can continue to provide services only after confirming that the metadata is fully synchronized.
  • DataNode: In addition to sharing HDFS metadata information through the shared storage system, the primary NameNode and the backup NameNode also need to share the mapping relationship between HDFS data blocks and DataNode.
  • The DataNode will report the location information of the data block to both the primary NameNode and the backup NameNode.

1.2 Analysis of data synchronization mechanism of shared storage system based on QJM

Currently, Hadoop supports the use of Quorum Journal Manager (QJM) or Network File System (NFS) as a shared storage system. Here, the QJM cluster is used as an example: the Active NameNode first submits the EditLog to the JournalNode cluster, and then the Standby NameNode synchronizes the EditLog from the JournalNode cluster at a regular interval. When the Active NameNode goes down, the Standby NameNode can provide external services after confirming that the metadata is fully synchronized.

It should be noted that writing EditLog to the JournalNode cluster follows the strategy of "success if more than half of the entries are written", so you must have at least 3 JournalNode nodes. Of course, you can continue to increase the number of nodes, but the total number of nodes should be an odd number. At the same time, if there are 2N+1 JournalNodes, then according to the principle of majority write, at most N JournalNodes can be tolerated to fail.

1.3 NameNode Active/Standby Switchover

The process of NameNode implementing active/standby switching is shown in the figure below:

After HealthMonitor is initialized, it starts an internal thread to periodically call the method of the HAServiceProtocol RPC interface corresponding to the NameNode to detect the health status of the NameNode.

If the HealthMonitor detects that the health status of the NameNode has changed, it will call back the corresponding method registered by the ZKFailoverController for processing.

If ZKFailoverController determines that active/standby switching is required, it will first use ActiveStandbyElector to automatically elect the active/standby nodes.

ActiveStandbyElector interacts with Zookeeper to complete automatic master-slave election.

After the master/slave election is completed, ActiveStandbyElector will call back the corresponding method of ZKFailoverController to notify the current NameNode to become the master NameNode or the standby NameNode.

ZKFailoverController calls the method of the HAServiceProtocol RPC interface of the corresponding NameNode to convert the NameNode to the Active state or the Standby state.

1.4 YARN High Availability

The high availability of YARN ResourceManager is similar to that of HDFS NameNode, but unlike NameNode, ResourceManager does not have as much metadata information to maintain, so its status information can be written directly to Zookeeper and rely on Zookeeper for master-slave election.

2. Cluster Planning

According to the design goal of high availability: it is necessary to ensure that there are at least two NameNodes (one active and one standby) and two ResourceManagers (one active and one standby). At the same time, in order to meet the principle of "more than half of the writes are successful", there must be at least 3 JournalNode nodes. Three hosts are used here for construction, and the cluster planning is as follows:

3. Prerequisites All servers have JDK installed. For installation steps, see: JDK installation under Linux ; set up the ZooKeeper cluster. For construction steps, see: Zookeeper stand-alone environment and cluster environment construction . SSH password-free login is configured between all servers.

4. Cluster Configuration

4.1 Download and unzip

Download Hadoop. Here I downloaded the CDH version of Hadoop, the download address is: http://archive.cloudera.com/cdh5/cdh/5/

# tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz

4.2 Configure environment variables

Edit the profile file:

# vim /etc/profile

Add the following configuration:

export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2export PATH=${HADOOP_HOME}/bin:$PATH

Execute the source command to make the configuration take effect immediately:

 # source /etc/profile

4.3 Modify configuration

Go to the ${HADOOP_HOME}/etc/hadoop directory and modify the configuration file. The contents of each configuration file are as follows:

1. hadoop-env.sh

 # 指定JDK的安裝位置export JAVA_HOME=/usr/java/jdk1.8.0_201/

2. core-site.xml

<configuration>
 <property>
 <!-- Specify the communication address of the namenode's hdfs protocol file system -->
 <name>fs.defaultFS</name>
 <value>hdfs://hadoop001:8020</value>
 </property>
 <property>
 <!-- Specify the directory where the Hadoop cluster stores temporary files -->
 <name>hadoop.tmp.dir</name>
 <value>/home/hadoop/tmp</value>
 </property>
 <property>
 <!-- Address of the ZooKeeper cluster -->
 <name>ha.zookeeper.quorum</name>
 <value>hadoop001:2181,hadoop002:2181,hadoop002:2181</value>
 </property>
 <property>
 <!-- ZKFC connects to ZooKeeper timeout -->
 <name>ha.zookeeper.session-timeout.ms</name>
 <value>10000</value>
 </property>
</configuration>

3. hdfs-site.xml

<configuration>
 <property>
 <!-- Specify the number of HDFS replicas -->
 <name>dfs.replication</name>
 <value>3</value>
 </property>
 <property>
 <!-- The storage location of namenode node data (i.e. metadata). You can specify multiple directories to achieve fault tolerance. Multiple directories are separated by commas. -->
 <name>dfs.namenode.name.dir</name>
 <value>/home/hadoop/namenode/data</value>
 </property>
 <property>
 <!-- Datanode node data (ie data block) storage location -->
 <name>dfs.datanode.data.dir</name>
 <value>/home/hadoop/datanode/data</value>
 </property>
 <property>
 <!-- Logical name of the cluster service -->
 <name>dfs.nameservices</name>
 <value>mycluster</value>
 </property>
 <property>
 <!-- NameNode ID list -->
 <name>dfs.ha.namenodes.mycluster</name>
 <value>nn1,nn2</value>
 </property>
 <property>
 <!-- nn1's RPC communication address -->
 <name>dfs.namenode.rpc-address.mycluster.nn1</name>
 <value>hadoop001:8020</value>
 </property>
 <property>
 <!-- nn2's RPC communication address -->
 <name>dfs.namenode.rpc-address.mycluster.nn2</name>
 <value>hadoop002:8020</value>
 </property>
 <property>
 <!-- nn1's http communication address -->
 <name>dfs.namenode.http-address.mycluster.nn1</name>
 <value>hadoop001:50070</value>
 </property>
 <property>
 <!-- nn2's http communication address -->
 <name>dfs.namenode.http-address.mycluster.nn2</name>
 <value>hadoop002:50070</value>
 </property>
 <property>
 <!-- Shared storage directory for NameNode metadata on JournalNode -->
 <name>dfs.namenode.shared.edits.dir</name>
 <value>qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/mycluster</value>
 </property>
 <property>
 <!-- Journal Edit Files storage directory -->
 <name>dfs.journalnode.edits.dir</name>
 <value>/home/hadoop/journalnode/data</value>
 </property>
 <property>
 <!-- Configure the isolation mechanism to ensure that only one NameNode is active at any given time -->
 <name>dfs.ha.fencing.methods</name>
 <value>sshfence</value>
 </property>
 <property>
 <!-- SSH password-free login is required when using sshfence mechanism-->
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/root/.ssh/id_rsa</value>
 </property>
 <property>
 <!-- SSH timeout -->
 <name>dfs.ha.fencing.ssh.connect-timeout</name>
 <value>30000</value>
 </property>
 <property>
 <!-- Access proxy class, used to determine the NameNode currently in the Active state -->
 <name>dfs.client.failover.proxy.provider.mycluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <property>
 <!-- Enable automatic failover -->
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
 </property>
</configuration>

4. yarn-site.xml

<configuration>
 <property>
 <!--Configure the auxiliary services running on the NodeManager. MapReduce programs can be run on Yarn only after mapreduce_shuffle is configured. -->
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <property>
 <!-- Whether to enable log aggregation (optional) -->
 <name>yarn.log-aggregation-enable</name>
 <value>true</value>
 </property>
 <property>
 <!-- Aggregate log storage time (optional) -->
 <name>yarn.log-aggregation.retain-seconds</name>
 <value>86400</value>
 </property>
 <property>
 <!-- Enable RM HA -->
 <name>yarn.resourcemanager.ha.enabled</name>
 <value>true</value>
 </property>
 <property>
 <!-- RM cluster identifier -->
 <name>yarn.resourcemanager.cluster-id</name>
 <value>my-yarn-cluster</value>
 </property>
 <property>
 <!-- RM logical ID list -->
 <name>yarn.resourcemanager.ha.rm-ids</name>
 <value>rm1,rm2</value>
 </property>
 <property>
 <!-- RM1 service address -->
 <name>yarn.resourcemanager.hostname.rm1</name>
 <value>hadoop002</value>
 </property>
 <property>
 <!-- RM2 service address -->
 <name>yarn.resourcemanager.hostname.rm2</name>
 <value>hadoop003</value>
 </property>
 <property>
 <!-- Address of the RM1 web application -->
 <name>yarn.resourcemanager.webapp.address.rm1</name>
 <value>hadoop002:8088</value>
 </property>
 <property>
 <!-- Address of the RM2 web application -->
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>hadoop003:8088</value>
 </property>
 <property>
 <!-- Address of the ZooKeeper cluster -->
 <name>yarn.resourcemanager.zk-address</name>
 <value>hadoop001:2181,hadoop002:2181,hadoop003:2181</value>
 </property>
 <property>
 <!-- Enable AutoRecovery -->
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
 </property>
 <property>
 <!-- Class for persistent storage-->
 <name>yarn.resourcemanager.store.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
 </property>
</configuration>

5. mapred-site.xml

<configuration>
 <property>
 <!--Specify the MapReduce job to run on YARN-->
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

5. slaves

Configure the hostnames or IP addresses of all slave nodes, one per line. DataNode service and NodeManager service will be started on all slave nodes.

hadoop001
hadoop002
hadoop003

4.4 Distribution Procedure

Distribute the Hadoop installation package to the other two servers. After distribution, it is recommended to configure the Hadoop environment variables on the two servers.

# Distribute the installation package to hadoop002
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop002:/usr/app/
# Distribute the installation package to hadoop003
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop003:/usr/app/

5.1 Start ZooKeeper

Start the ZooKeeper service on the three servers respectively:

 zkServer.sh start

5.2 Start Journalnode

Go to the ${HADOOP_HOME}/sbin directory of each of the three servers and start the journalnode process:

 hadoop-daemon.sh start journalnode

5.3 Initializing NameNode

Execute the NameNode initialization command on hadop001 :

hdfs namenode -format

After executing the initialization command, you need to copy the contents of the NameNode metadata directory to other unformatted NameNode . The metadata storage directory is the directory we specified using dfs.namenode.name.dir property in hdfs-site.xml . Here we need to copy it to hadoop002 :

 scp -r /home/hadoop/namenode/data hadoop002:/home/hadoop/namenode/

5.4 Initializing HA status

Use the following command on any NameNode to initialize the HA state in ZooKeeper:

 hdfs zkfc -formatZK

5.5 Start HDFS

Go to the ${HADOOP_HOME}/sbin directory of hadoop001 and start HDFS. At this point, the NameNode services on hadoop001 and hadoop002 , and the DataNode services on the three servers will be started:

 start-dfs.sh

5.6 Starting YARN

Go to the ${HADOOP_HOME}/sbin directory of hadoop002 and start YARN. At this time, the ResourceManager service on hadoop002 and NodeManager services on the three servers will be started:

 start-yarn.sh

It should be noted that the ResourceManager service on hadoop003 is usually not started at this time and needs to be started manually:

 yarn-daemon.sh start resourcemanager

6. View the cluster

6.1 View Process

After a successful startup, the process on each server should be as follows:

[root@hadoop001 sbin]# jps
4512 DFSZKFailoverController
3714 JournalNode
4114 NameNode
3668 QuorumPeerMain
5012 DataNode
4639 NodeManager
[root@hadoop002 sbin]# jps
4499 ResourceManager
4595 NodeManager
3465 QuorumPeerMain
3705 NameNode
3915 DFSZKFailoverController
5211 DataNode
3533 JournalNode
[root@hadoop003 sbin]# jps
3491 JournalNode
3942 NodeManager
4102 ResourceManager
4201 DataNode
3435 QuorumPeerMain

6.2 View the Web UI

The port numbers of HDFS and YARN are 50070 and 8080 respectively. The interface should be as follows:

At this time, NameNode on hadoop001 is in an available state:

NameNode on hadoop002 is in standby state:



ResourceManager on hadoop002 is in available state:



ResourceManager on hadoop003 is in standby state:



At the same time, the interface also has relevant information about Journal Manager :


7. Secondary startup of the cluster

The initial startup of the cluster above involves some necessary initialization operations, so the process is a bit cumbersome. However, once the cluster is built, it is convenient to enable it again. The steps are as follows (first make sure that the ZooKeeper cluster is started):

Start HDFS in hadoop001 . At this time, all services related to HDFS high availability will be started, including NameNode, DataNode and JournalNode:

 start-dfs.sh

Start YARN in hadoop002 :

 start-yarn.sh

At this time, the ResourceManager service on hadoop003 is usually not started and needs to be started manually:

 yarn-daemon.sh start resourcemanager

References

The above construction steps are mainly referenced from the official documentation:

HDFS High Availability Using the Quorum Journal Manager ResourceManager High Availability

Summarize

The above is a tutorial on how to build a Hadoop high-availability cluster based on ZooKeeper. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • Teach you how to build a Hadoop 3.x pseudo cluster on Tencent Cloud
  • Methods and steps for building a Hadoop distributed cluster
  • How to build a Hadoop cluster environment with ubuntu docker
  • Detailed explanation of building hadoop and hbase cluster with docker
  • Detailed explanation of building Ubuntu version of Hadoop cluster
  • Detailed explanation of how to quickly build a Hadoop cluster environment using Docker from scratch
  • Detailed explanation of using docker to build a Hadoop distributed cluster
  • Common considerations for building a Hadoop 3.2.0 cluster

<<:  MySQL5.7.17 winx64 installation version configuration method graphic tutorial under Windows server 2008 r2

>>:  Detailed explanation of jQuery chain calls

Recommend

Example of converting spark rdd to dataframe and writing it into mysql

Dataframe is a new API introduced in Spark 1.3.0,...

The difference between shtml and html

Shtml and asp are similar. In files named shtml, s...

Detailed examples of Docker-compose networks

Today I experimented with the network settings un...

Solution to the problem that Docker container cannot be stopped or killed

Docker version 1.13.1 Problem Process A MySQL con...

Detailed tutorial on running multiple Springboot with Docker

Docker runs multiple Springboot First: Port mappi...

Understanding of haslaylout and bfc parsing

1. haslayout and bfc are IE-specific and standard ...

JavaScript to implement a simple clock

This article example shares the specific code for...

MySQL's conceptual understanding of various locks

Optimistic Locking Optimistic locking is mostly i...

Docker learning method steps to build ActiveMQ message service

Preface ActiveMQ is the most popular and powerful...

Example of using @media responsive CSS to adapt to various screens

Definition and Use Using @media queries, you can ...

Linux Autofs automatic mount service installation and deployment tutorial

Table of contents 1. Introduction to autofs servi...

Introduction to Computed Properties in Vue

Table of contents 1. What is a calculated propert...

HTML small tag usage tips

Phrase elements such as <em></em> can ...