VMware + Ubuntu18.04 Graphic Tutorial on Building Hadoop Cluster Environment

Preface
VMware clone virtual machines (preparation, clone 3 virtual machines, one master and two nodes)
1. Create a Hadoop user (executed on master, node1, node2)
2. Update apt download source (executed on master, node1, node2)
3. Install SSH and configure SSH password-free login (executed on master, node1, and node2)
4. Install the Java environment (executed on master, node1, node2)
Modify the host name (executed on master, node1, node2)
Modify IP mapping (executed on master, node1, node2)
SSH password-free login to other nodes (executed on the master)
Install hadoop3.2.1 (executed in master)
Configure the hadoop environment (this step needs to be very careful)
Start (executed on the master)
Shut down the Hadoop cluster (executed on the master)
Summarize

Preface

This tutorial is based on the school's big data experiment. While setting up the tutorial, the blogger took screenshots of his command execution results. It took nearly three hours to set up the environment in the library and write the blog. Looking at the computer for a long time can hurt your eyes, so you need to pay attention to protecting your eyes and do eye exercises. I hope those who have learned something can give me a thumbs up!

insert image description here

VMware clone virtual machines (preparation, clone 3 virtual machines, one master and two nodes)

Shut down the system in the virtual machine first
Right-click the virtual machine, click Manage, and select Clone.

insert image description here

3. Click Next, select Full Clone, and select the path.

insert image description here

1. Create a Hadoop user (executed on master, node1, node2)

Execute the following commands in sequence

1. Create a hadoop user

sudo useradd -m hadoop -s /bin/bash

Set user password (enter twice)

sudo passwd hadoop

Add permissions

sudo adduser hadoop sudo

Switch to hadoop user (enter the hadoop password you just set here)

su hadoop

Run screenshot display (taking the master virtual machine as an example)

insert image description here

2. Update apt download source (executed on master, node1, node2)

sudo apt-get update

Screenshot display (taking master as an example)

insert image description here

3. Install SSH and configure SSH password-free login (executed on master, node1, and node2)

1. Install SSH

sudo apt-get install openssh-server

2. Configure SSH password-free login

ssh localhost
exit 
cd ~/.ssh/ 
ssh-keygen -t rsa #Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys

3. Password-free verification

ssh localhost
exit 
cd ~/.ssh/ 
ssh-keygen -t rsa #Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys

Screenshot display (taking master as an example)

insert image description here

4. Install the Java environment (executed on master, node1, node2)

1. Download the JDK environment package

sudo apt-get install default-jre default-jdk

2. Configure environment variable files

vim ~/.bashrc

3. Add to the first line of the file

export JAVA_HOME=/usr/lib/jvm/default-java

4,. Make environment variables effective

source ~/.bashrc

5. Verification

java -version

Screenshot display (taking master as an example)

insert image description here

Modify the host name (executed on master, node1, node2)

1. Delete the original host name in the file, write master in master, write node1, node2 in node1... (similarly)

sudo vim /etc/hostname

Restart the three servers

reboot

After the restart is successful, connect to the session again and find that the host name has changed

Screenshot display (taking node1 as an example)

insert image description here

Modify IP mapping (executed on master, node1, node2)

View the IP addresses of each virtual machine

ifconfig -a

If there is an error, download net-tools and run it again to see

sudo apt install net-tools

As shown in the figure below, the red box is the IP address of this virtual machine
insert image description here

All three virtual machines need to add each other's IP addresses to the hosts file

sudo vim /etc/hosts

Take master as an example to show the screenshot
insert image description here

SSH password-free login to other nodes (executed on the master)

Execute on the Master

cd ~/.ssh 
rm ./id_rsa* # Delete the previously generated public key (if any)
ssh-keygen -t rsa # Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys
scp ~/.ssh/id_rsa.pub hadoop@node1:/home/hadoop/
scp ~/.ssh/id_rsa.pub hadoop@node2:/home/hadoop/

insert image description here

On both node1 and node2, execute

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
rm ~/id_rsa.pub # delete it after use

insert image description here

Verify password-free login

ssh node1
exit
ssh node2
exit

Take master as an example to show the screenshot

insert image description here

Install hadoop3.2.1 (executed in master)

The download URLs of some mirrors are invalid, so here are the download addresses of the official website.

Download URL: hadoop3.2.1 download URL

After downloading, upload it to the master's /home/hadoop through VMware-Tools

insert image description here
Unzip

cd /home/hadoop
sudo tar -zxf hadoop-3.2.1.tar.gz -C /usr/local #Unzip cd /usr/local/
sudo mv ./hadoop-3.2.1/ ./hadoop # Change the folder name to hadoop
sudo chown -R hadoop ./hadoop # Modify file permissions

verify

cd /usr/local/hadoop
./bin/hadoop version

insert image description here

Configure the hadoop environment (this step needs to be very careful)

Configuring environment variables

vim ~/.bashrc

Write in the first line

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Make the configuration effective

source ~/.bashrc

Create a file directory (to prepare for the following XML)

cd /usr/local/hadoop
mkdir dfs
cd dfs
mkdir name data tmp
cd /usr/local/hadoop
mkdir tmp

Configure java environment variables for hadoop

vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh

vim $HADOOP_HOME/etc/hadoop/yarn-env.sh

Write the first line of both

export JAVA_HOME=/usr/lib/jvm/default-java

(In master) Configure nodes

cd /usr/local/hadoop/etc/hadoop

Delete the original localhost. Since we have two nodes, write the names of these two nodes into

vim workers

node1
node2

Configure core-site.xml

vim core-site.xml

Because we have only one namenode, we use fs.default.name instead of fs.defaultFs

Secondly, make sure that the directory /usr/local/hadoop/tmp exists

<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://Master:9000</value>
 </property>
 
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/usr/local/hadoop/tmp</value>
 </property>
</configuration>

Configure hdfs-site.xml

vim hdfs-site.xml

dfs.namenode.secondary.http-address Ensure that the port is not the same as the port in core-site.xml, which may cause occupation

Make sure /usr/local/hadoop/dfs/name :/usr/local/hadoop/dfs/data exists

Since we only have 2 nodes, dfs.replication is set to 2

<configuration>
 <property>
 <name>dfs.namenode.secondary.http-address</name>
 <value>Master:9001</value>
 </property>
 
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/usr/local/hadoop/dfs/name</value>
 </property>
 
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/usr/local/hadoop/dfs/data</value>
 </property>
 
 <property>
 <name>dfs.replication</name>
 <value>2</value>
 </property>
</configuration>

Configure mapred-site.xml

vim mapred-site.xml

<configuration>
 <property> 
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

Configure yarn-site.xml

vim yarn-site.xml

<configuration>
 <property>
 <name>yarn.resourcemanager.hostname</name>
 <value>Master</value>
 </property>
 
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value> 
 </property>
 
 <property>
 <name>yarn.nodemanager.vmem-check-enabled</name>
 <value>false</value>
 </property>
</configuration>

Compress hadoop

cd /usr/local
tar -zcf ~/hadoop.master.tar.gz ./hadoop #Compress cd ~

Copy to node1

scp ./hadoop.master.tar.gz node1:/home/hadoop

Copy to node2

scp ./hadoop.master.tar.gz node2:/home/hadoop

Decompress on node1 and node2

sudo rm -r /usr/local/hadoop # Delete the old one (if it exists)
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local #Unzip sudo chown -R hadoop /usr/local/hadoop #Modify permissions

The first time you start up, you need to format the NameNode on the Master node.

hdfs namenode -format

(Note: If you need to reformat the NameNode, you need to delete all the files under the original NameNode and DataNode first! ...

#See the text above, do not copy directly rm -rf $HADOOP_HOME/dfs/data/*
rm -rf $HADOOP_HOME/dfs/name/*

Start (executed on the master)

start-all.sh
mr-jobhistory-daemon.sh start historyserver

In master, Warning does not affect

jps

Run screenshot display

insert image description here

Shut down the Hadoop cluster (executed on the master)

stop-all.sh
mr-jobhistory-daemon.sh stop historyserver

Run screenshot display

insert image description here

Summarize

Setting up the environment is a relatively time-consuming operation. If you do it yourself, you may encounter many problems, such as unfamiliarity with Linux commands, various errors, inconsistent running results, etc. However, you can usually find corresponding solutions on the Internet. To learn new technologies, you must have the courage to try and make mistakes, and then summarize. This will help you form your own logical framework for problem solving and enhance the formation of your knowledge framework. Come on!

This is the end of this graphic tutorial on how to build a Hadoop cluster environment with VMware + Ubuntu 18.04. For more information about building a Hadoop cluster with VMware Ubuntu, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:

VMware Workstation Pro 16 Graphic Tutorial on Building CentOS8 Virtual Machine Cluster
VMware configuration hadoop to achieve pseudo-distributed graphic tutorial
How to install hadoop1.x under VMware virtual machine
Detailed explanation of VMware12 using three virtual machines Ubuntu16.04 system to build hadoop-2.7.1+hbase-1.2.4 (fully distributed)

<<: vue+ts realizes the effect of element mouse drag

>>: Web Design Principles of Hyperlinks

Beginners understand MySQL deadlock problem from source code

Table of contents

Preface

VMware clone virtual machines (preparation, clone 3 virtual machines, one master and two nodes)

1. Create a Hadoop user (executed on master, node1, node2)

2. Update apt download source (executed on master, node1, node2)

3. Install SSH and configure SSH password-free login (executed on master, node1, and node2)

4. Install the Java environment (executed on master, node1, node2)

Modify the host name (executed on master, node1, node2)

Modify IP mapping (executed on master, node1, node2)

SSH password-free login to other nodes (executed on the master)

Install hadoop3.2.1 (executed in master)

Configure the hadoop environment (this step needs to be very careful)

Start (executed on the master)

Shut down the Hadoop cluster (executed on the master)

Summarize

Recommend