CentOS 7 builds hadoop 2.10 high availability (HA)

CentOS 7 builds hadoop 2.10 high availability (HA)

This article introduces how to build a high-availability Hadoop 2.10 cluster in CentOS 7. First, prepare 6 machines: 2 NN (namenode); 4 DN (datanode); 3 JNS (journalnodes)

IP hostname process
192.168.30.141 s141 nn1 (namenode), zkfc (DFSZKFailoverController), zk (QuorumPeerMain)
192.168.30.142 s142 dn (datanode), jn (journalnode), zk (QuorumPeerMain)
192.168.30.143 s143 dn (datanode), jn (journalnode), zk (QuorumPeerMain)
192.168.30.144 s144 dn (datanode), jn (journalnode)
192.168.30.145 s145 dn (datanode)
192.168.30.146 s146 nn2 (namenode), zkfc (DFSZKFailoverController)

Jps process of each machine:

Since I use vmware virtual machines, after configuring one machine, I use clone to clone the remaining machines and modify the hostname and IP, so that the configuration of each machine is unified. Add hdfs users and user groups to each machine configuration, configure the jdk environment, and install hadoop. This time, a high-availability cluster is built under the hdfs user. You can refer to: CentOS 7 builds hadoop 2.10 pseudo-distributed mode

Here are some steps and details to install a high availability cluster:

1. Set the hostname and hosts for each machine

Modify the hosts file. After the hosts are set, you can use the hostname to access the machine. This is more convenient. Modify as follows:

127.0.0.1 locahost
192.168.30.141 s141
192.168.30.142 s142
192.168.30.143 s143
192.168.30.144 s144
192.168.30.145 s145
192.168.30.146 s146

2. Set up ssh password-free login. Since s141 and s146 are both namenodes, you need to log in to all machines without passwords from these two machines. It is best to set up password-free login for both hdfs users and root users.

We set s141 to nn1 and s146 to nn2. We need s141 and s146 to be able to log in to other machines through ssh without a password. To do this, we need to generate a key pair under the hdfs user on the s141 and s146 machines, and send the s141 and s146 public keys to other machines and put them in the ~/.ssh/authorized_keys file. More precisely, we need to add the public keys to all machines (including ourselves).

Generate a key pair on the s141 and s146 machines:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Append the contents of the id_rsa.pub file to /home/hdfs/.ssh/authorized_keys on the s141-s146 machines. Since other machines do not have authorized_keys files, we can rename id_rsa.pub to authorized_keys. If other machines already have authorized_keys files, we can append the contents of id_rsa.pub to the file. For remote copying, we can use the scp command:

Copy the s141 machine public key to other machines

scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_141.pub
scp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_141.pub

Copy the s146 machine public key to other machines

scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_146.pub
scp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_146.pub

On each machine, you can use cat to append the key to the authorized_keys file

cat id_rsa_141.pub >> authorized_keys
cat id_rsa_146.pub >> authorized_keys

At this time, the permissions of the authorized_keys file need to be changed to 644 (note that ssh passwordless login often fails due to this permission problem)

chmod 644 authorized_keys

3. Configure the hadoop configuration file (${hadoop_home}/etc/hadoop/)

Configuration details:

Note: s141 and s146 have exactly the same configuration, especially ssh.

1) Configure nameservice

[hdfs-site.xml]
<property>
 <name>dfs.nameservices</name>
 <value>mycluster</value>
</property>


2) dfs.ha.namenodes.[nameservice ID]

[hdfs-site.xml]
<!-- Two ids of the name node under myucluster -->
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>

3) dfs.namenode.rpc-address.[nameservice ID].[name node ID]

[hdfs-site.xml]
Configure the rpc address of each nn.
<property>
 <name>dfs.namenode.rpc-address.mycluster.nn1</name>
 <value>s141:8020</value>
</property>
<property>
 <name>dfs.namenode.rpc-address.mycluster.nn2</name>
 <value>s146:8020</value>
</property>

4) dfs.namenode.http-address.[nameservice ID].[name node ID]
Configure the webui port

[hdfs-site.xml]
<property>
 <name>dfs.namenode.http-address.mycluster.nn1</name>
 <value>s141:50070</value>
</property>
<property>
 <name>dfs.namenode.http-address.mycluster.nn2</name>
 <value>s146:50070</value>
</property>

5) dfs.namenode.shared.edits.dir
Name node shared editing directory. Select three journalnode nodes, here select s142, s143, s144 three machines

[hdfs-site.xml]
<property>
 <name>dfs.namenode.shared.edits.dir</name>
 <value>qjournal://s142:8485;s143:8485;s144:8485/mycluster</value>
</property>

6) dfs.client.failover.proxy.provider.[nameservice ID]
Configure a HA failover Java class (the configuration is fixed), and the client uses it to determine which node is active.

[hdfs-site.xml]
<property>
 <name>dfs.client.failover.proxy.provider.mycluster</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

7) dfs.ha.fencing.methods
Script list or java class, in the disaster recovery protection activated state nn.

[hdfs-site.xml]
<property>
 <name>dfs.ha.fencing.methods</name>
 <value>sshfence</value>
</property>

<property>
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/home/hdfs/.ssh/id_rsa</value>
</property>

8) fs.defaultFS
Configure hdfs file system name service. Here mycluster is the dfs.nameservices configured above

[core-site.xml]
<property>
 <name>fs.defaultFS</name>
 <value>hdfs://mycluster</value>
</property>

9) dfs.journalnode.edits.dir
Configure the local path where JN stores edits.

[hdfs-site.xml]
<property>
 <name>dfs.journalnode.edits.dir</name>
 <value>/home/hdfs/hadoop/journal</value>
</property>

Complete configuration file:

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://mycluster/</value>
 </property>
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/home/hdfs/hadoop</value>
 </property>
</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
   <name>dfs.replication</name>
   <value>3</value>
 </property>
 <property>
   <name>dfs.hosts</name>
   <value>/opt/soft/hadoop/etc/dfs.include.txt</value>
 </property>
 <property>
   <name>dfs.hosts.exclude</name>
   <value>/opt/soft/hadoop/etc/dfs.hosts.exclude.txt</value>
 </property>
 <property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
 </property>
 <property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>s141:8020</value>
 </property>
 <property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>s146:8020</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>s141:50070</value>
 </property>
 <property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>s146:50070</value>
 </property>
 <property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://s142:8485;s143:8485;s144:8485/mycluster</value>
 </property>
 <property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
 </property>

 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/hdfs/.ssh/id_rsa</value>
 </property>
 <property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/home/hdfs/hadoop/journal</value>
 </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>

<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
   <name>yarn.resourcemanager.hostname</name>
   <value>s141</value>
 </property>
 <property>
    <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
</configuration>

4. Deployment Details

1) Start the jn processes (s142, s143, s144) on the jn nodes respectively

hadoop-daemon.sh start journalnode

2) After starting jn, synchronize disk metadata between the two NNs

a) If it is a brand new cluster, format the file system first. This only needs to be executed on one NN.
[s141|s146]

hadoop namenode -format

b) If you convert a non-HA cluster to a HA cluster, copy the metadata of the original NN to another NN.

1. Step 1 On the s141 machine, copy the hadoop data to the directory corresponding to s146

scp -r /home/hdfs/hadoop/dfs hdfs@s146:/home/hdfs/hadoop/

2. Step 2: Run the following command on the new nn (unformatted nn, mine is s146) to boot in standby mode. Note: s141namenode needs to be started (you can execute: hadoop-daemon.sh start namenode ).

hdfs namenode -bootstrapStandby

If the s141 name node is not started, it will fail, as shown in the figure:

After starting the s141 name node, execute the command on s141

hadoop-daemon.sh start namenode

Then execute the standby boot command, note: if prompted whether to format, select N, as shown in the figure:

    3. Step 3

Execute the following command on one of the NNs to complete the transmission of the edit log to the jn node.

hdfs namenode -initializeSharedEdits

If the java.nio.channels.OverlappingFileLockException error is reported during execution:

Indicates that namenode is starting and needs to be stopped (hadoop-daemon.sh stop namenode)

After execution, check whether s142, s143, and s144 have edit data. Here, check that the mycluster directory has been produced, which contains edit log data, as follows:

4. Step 4

Start all nodes.

Start the name node and all data nodes on s141:

hadoop-daemon.sh start namenode
hadoop-daemons.sh start datanode

Start the name node on s146

hadoop-daemon.sh start namenode

At this time, visit http://192.168.30.141:50070/ and http://192.168.30.146:50070/ in the browser and you will find that both namenodes are standby.

At this time, you need to manually use the command to switch one of them to the active state. Here, set s141 (nn1) to active

hdfs haadmin -transitionToActive nn1

At this time, s141 is active

Common commands of hdfs haadmin:

At this point, the manual disaster recovery high availability configuration is complete, but this method is not intelligent and cannot automatically sense disaster recovery, so the following introduces the automatic disaster recovery configuration

5. Automatic disaster recovery configuration

Need to introduce two components: zookeeperquarum and zk disaster recovery controller (ZKFC)

Build a zookeeper cluster, select three machines s141, s142, and s143, and download zookeeper: http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6

1) Unzip zookeeper:

tar -xzvf apache-zookeeper-3.5.6-bin.tar.gz -C /opt/soft/zookeeper-3.5.6

2) Configure environment variables, add zk environment variables in /etc/profile, and recompile the /etc/profile file

Copy the code as follows:
source /etc/profile

3) Configure the zk configuration file, and unify the configuration files of the three machines

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/home/hdfs/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=s141:2888:3888
server.2=s142:2888:3888
server.3=s143:2888:3888

4) Respectively

Create a myid file in the /home/hdfs/zookeeper directory of s141 (the dataDir path configured in the zoo.cfg configuration file) with a value of 1 (corresponding to server.1 in the zoo.cfg configuration file)

Create a myid file in the /home/hdfs/zookeeper directory of s142 (the dataDir path configured in the zoo.cfg configuration file) with a value of 2 (corresponding to server.2 in the zoo.cfg configuration file)

Create a myid file in the /home/hdfs/zookeeper directory of s143 (the dataDir path configured in the zoo.cfg configuration file) with a value of 3 (corresponding to server.3 in the zoo.cfg configuration file)

5) Start zk on each machine separately

zkServer.sh start

If the startup is successful, the zk process will appear:

Configure hdfs related configuration:

1) Stop all hdfs processes

stop-all.sh

2) Configure hdfs-site.xml and enable automatic disaster recovery.

[hdfs-site.xml]
<property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
</property>

3) Configure core-site.xml and specify the connection address of zk.

<property>
 <name>ha.zookeeper.quorum</name>
 <value>s141:2181,s142:2181,s143:2181</value>
</property>

4) Distribute the above two files to all nodes.

5) In one of the NNs (s141), initialize the HA state in ZK

hdfs zkfc -formatZK

The following result indicates success:

You can also check it in zk:

6) Start the HDFS cluster

start-dfs.sh

View the processes of each machine:

Startup successful, take a look at the webui

s146 is activated

s141 is in standby mode

At this point, the Hadoop automatic disaster recovery HA is built

Summarize

The above is what I introduced to you about building hadoop2.10 high availability (HA) on centos7. I hope it will be helpful to you!

You may also be interested in:
  • Detailed example of Hadoop multi-job parallel processing
  • Detailed explanation of common hadoop errors and solutions
  • How to configure Hadoop to use IntelliJ IDEA for remote debugging code
  • Detailed tutorial on how to use Hadoop integrated with Spring (quick start with big data)
  • Detailed method of using IDEA to build Hadoop development environment under Windows
  • Teach you how to build a Hadoop 3.x pseudo cluster on Tencent Cloud
  • How to run Hadoop and create images in Docker
  • Teach you how to use hadoop to extract specified content from a file

<<:  About the problem of writing plugins for mounting DOM in vue3

>>:  Analysis of the principle of Mybatis mapper dynamic proxy

Recommend

Detailed explanation of Vue's monitoring method case

Monitoring method in Vue watch Notice Name: You s...

JavaScript to achieve simple tab bar switching case

This article shares the specific code for JavaScr...

Detailed process of configuring NIS in Centos7

Table of contents principle Network environment p...

Detailed tutorial on installing Spring boot applications on Linux systems

Unix/Linux Services systemd services Operation pr...

How to purchase and initially build a server

I haven't worked with servers for a while. No...

How to use Vue to develop public account web pages

Table of contents Project Background start Create...

Docker Compose one-click ELK deployment method implementation

Install Filebeat has completely replaced Logstash...

Summary of basic SQL statements in MySQL database

This article uses examples to describe the basic ...

Things to note when writing self-closing XHTML tags

The img tag in XHTML should be written like this:...

Best way to replace the key in json object

JSON (JavaScript Object Notation, JS Object Notat...

In-depth understanding of the vertical-align property and baseline issues in CSS

vertical-align attribute is mainly used to change...

Examples of optimistic locking and pessimistic locking in MySQL

The task of concurrency control in a database man...

Should I abandon JQuery?

Table of contents Preface What to use if not jQue...

MySQL column to row conversion, method of merging fields (must read)

Data Sheet: Column to row: using max(case when th...