VMware + Ubuntu18.04 Graphic Tutorial on Building Hadoop Cluster Environment

VMware + Ubuntu18.04 Graphic Tutorial on Building Hadoop Cluster Environment

Preface

This tutorial is based on the school's big data experiment. While setting up the tutorial, the blogger took screenshots of his command execution results. It took nearly three hours to set up the environment in the library and write the blog. Looking at the computer for a long time can hurt your eyes, so you need to pay attention to protecting your eyes and do eye exercises. I hope those who have learned something can give me a thumbs up!

insert image description here


VMware clone virtual machines (preparation, clone 3 virtual machines, one master and two nodes)

  1. Shut down the system in the virtual machine first
  2. Right-click the virtual machine, click Manage, and select Clone.

insert image description here

3. Click Next, select Full Clone, and select the path.

insert image description here
insert image description here

insert image description here


1. Create a Hadoop user (executed on master, node1, node2)

Execute the following commands in sequence

1. Create a hadoop user

sudo useradd -m hadoop -s /bin/bash

Set user password (enter twice)

sudo passwd hadoop

Add permissions

sudo adduser hadoop sudo

Switch to hadoop user (enter the hadoop password you just set here)

su hadoop

Run screenshot display (taking the master virtual machine as an example)

insert image description here


2. Update apt download source (executed on master, node1, node2)

sudo apt-get update

Screenshot display (taking master as an example)

insert image description here

3. Install SSH and configure SSH password-free login (executed on master, node1, and node2)

1. Install SSH

sudo apt-get install openssh-server

2. Configure SSH password-free login

ssh localhost
exit 
cd ~/.ssh/ 
ssh-keygen -t rsa #Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys

3. Password-free verification

ssh localhost
exit 
cd ~/.ssh/ 
ssh-keygen -t rsa #Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys

Screenshot display (taking master as an example)

insert image description here


4. Install the Java environment (executed on master, node1, node2)

1. Download the JDK environment package

sudo apt-get install default-jre default-jdk

2. Configure environment variable files

vim ~/.bashrc

3. Add to the first line of the file

export JAVA_HOME=/usr/lib/jvm/default-java

4,. Make environment variables effective

source ~/.bashrc

5. Verification

java -version

Screenshot display (taking master as an example)

insert image description here

Modify the host name (executed on master, node1, node2)

1. Delete the original host name in the file, write master in master, write node1, node2 in node1... (similarly)

sudo vim /etc/hostname

Restart the three servers

reboot

After the restart is successful, connect to the session again and find that the host name has changed

Screenshot display (taking node1 as an example)

insert image description here


Modify IP mapping (executed on master, node1, node2)

View the IP addresses of each virtual machine

ifconfig -a

If there is an error, download net-tools and run it again to see

sudo apt install net-tools

As shown in the figure below, the red box is the IP address of this virtual machine
insert image description here

All three virtual machines need to add each other's IP addresses to the hosts file

sudo vim /etc/hosts

Take master as an example to show the screenshot
insert image description here


SSH password-free login to other nodes (executed on the master)

Execute on the Master

cd ~/.ssh 
rm ./id_rsa* # Delete the previously generated public key (if any)
ssh-keygen -t rsa # Keep pressing Enter cat ./id_rsa.pub >> ./authorized_keys
scp ~/.ssh/id_rsa.pub hadoop@node1:/home/hadoop/
scp ~/.ssh/id_rsa.pub hadoop@node2:/home/hadoop/ 

insert image description here

On both node1 and node2, execute

cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
rm ~/id_rsa.pub # delete it after use 

insert image description here

Verify password-free login

ssh node1
exit
ssh node2
exit

Take master as an example to show the screenshot

insert image description here


Install hadoop3.2.1 (executed in master)

The download URLs of some mirrors are invalid, so here are the download addresses of the official website.

Download URL: hadoop3.2.1 download URL

After downloading, upload it to the master's /home/hadoop through VMware-Tools

insert image description here
Unzip

cd /home/hadoop
sudo tar -zxf hadoop-3.2.1.tar.gz -C /usr/local #Unzip cd /usr/local/
sudo mv ./hadoop-3.2.1/ ./hadoop # Change the folder name to hadoop
sudo chown -R hadoop ./hadoop # Modify file permissions

verify

cd /usr/local/hadoop
./bin/hadoop version 

insert image description here


Configure the hadoop environment (this step needs to be very careful)

Configuring environment variables

vim ~/.bashrc

Write in the first line

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Make the configuration effective

source ~/.bashrc

Create a file directory (to prepare for the following XML)

cd /usr/local/hadoop
mkdir dfs
cd dfs
mkdir name data tmp
cd /usr/local/hadoop
mkdir tmp

Configure java environment variables for hadoop

vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
vim $HADOOP_HOME/etc/hadoop/yarn-env.sh

Write the first line of both

export JAVA_HOME=/usr/lib/jvm/default-java

(In master) Configure nodes

cd /usr/local/hadoop/etc/hadoop

Delete the original localhost. Since we have two nodes, write the names of these two nodes into

vim workers
node1
node2

Configure core-site.xml

vim core-site.xml

Because we have only one namenode, we use fs.default.name instead of fs.defaultFs

Secondly, make sure that the directory /usr/local/hadoop/tmp exists

<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://Master:9000</value>
 </property>
 
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/usr/local/hadoop/tmp</value>
 </property>
</configuration>

Configure hdfs-site.xml

vim hdfs-site.xml

dfs.namenode.secondary.http-address Ensure that the port is not the same as the port in core-site.xml, which may cause occupation

Make sure /usr/local/hadoop/dfs/name :/usr/local/hadoop/dfs/data exists

Since we only have 2 nodes, dfs.replication is set to 2

<configuration>
 <property>
 <name>dfs.namenode.secondary.http-address</name>
 <value>Master:9001</value>
 </property>
 
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/usr/local/hadoop/dfs/name</value>
 </property>
 
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/usr/local/hadoop/dfs/data</value>
 </property>
 
 <property>
 <name>dfs.replication</name>
 <value>2</value>
 </property>
</configuration>

Configure mapred-site.xml

vim mapred-site.xml
<configuration>
 <property> 
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>

Configure yarn-site.xml

vim yarn-site.xml
<configuration>
 <property>
 <name>yarn.resourcemanager.hostname</name>
 <value>Master</value>
 </property>
 
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value> 
 </property>
 
 <property>
 <name>yarn.nodemanager.vmem-check-enabled</name>
 <value>false</value>
 </property>
</configuration>

Compress hadoop

cd /usr/local
tar -zcf ~/hadoop.master.tar.gz ./hadoop #Compress cd ~

Copy to node1

scp ./hadoop.master.tar.gz node1:/home/hadoop

Copy to node2

scp ./hadoop.master.tar.gz node2:/home/hadoop

Decompress on node1 and node2

sudo rm -r /usr/local/hadoop # Delete the old one (if it exists)
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local #Unzip sudo chown -R hadoop /usr/local/hadoop #Modify permissions

The first time you start up, you need to format the NameNode on the Master node.

hdfs namenode -format

(Note: If you need to reformat the NameNode, you need to delete all the files under the original NameNode and DataNode first! ...

#See the text above, do not copy directly rm -rf $HADOOP_HOME/dfs/data/*
rm -rf $HADOOP_HOME/dfs/name/*

Start (executed on the master)

start-all.sh
mr-jobhistory-daemon.sh start historyserver

In master, Warning does not affect

jps

Run screenshot display

insert image description here


Shut down the Hadoop cluster (executed on the master)

stop-all.sh
mr-jobhistory-daemon.sh stop historyserver

Run screenshot display

insert image description here


Summarize

Setting up the environment is a relatively time-consuming operation. If you do it yourself, you may encounter many problems, such as unfamiliarity with Linux commands, various errors, inconsistent running results, etc. However, you can usually find corresponding solutions on the Internet. To learn new technologies, you must have the courage to try and make mistakes, and then summarize. This will help you form your own logical framework for problem solving and enhance the formation of your knowledge framework. Come on!

This is the end of this graphic tutorial on how to build a Hadoop cluster environment with VMware + Ubuntu 18.04. For more information about building a Hadoop cluster with VMware Ubuntu, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • VMware Workstation Pro 16 Graphic Tutorial on Building CentOS8 Virtual Machine Cluster
  • VMware configuration hadoop to achieve pseudo-distributed graphic tutorial
  • How to install hadoop1.x under VMware virtual machine
  • Detailed explanation of VMware12 using three virtual machines Ubuntu16.04 system to build hadoop-2.7.1+hbase-1.2.4 (fully distributed)

<<:  vue+ts realizes the effect of element mouse drag

>>:  Web Design Principles of Hyperlinks

Recommend

Implementation of Nginx configuration https

Table of contents 1: Prepare https certificate 2:...

Detailed explanation of grep and egrep commands in Linux

rep / egrep Syntax: grep [-cinvABC] 'word'...

Let me teach you how to use font icons in CSS

First of all, what is a font icon? On the surface...

Mysql Sql statement exercises (50 questions)

Table name and fields –1. Student List Student (s...

Solution to ERROR 1054 (42S22) when changing password in MySQL 5.7

I have newly installed MySQL 5.7. When I log in, ...

Detailed explanation of the MySQL MVCC mechanism principle

Table of contents What is MVCC Mysql lock and tra...

17 404 Pages You'll Want to Experience

How can we say that we should avoid 404? The reas...

Detailed tutorial on how to install mysql8.0 using Linux yum command

1. Do a good job of cleaning before installation ...

MySQL optimization connection optimization

In the article MySQL Optimization: Cache Optimiza...

A brief analysis of SQL examples for finding uncommitted transactions in MySQL

A long time ago, I summarized a blog post titled ...

Summary of js execution context and scope

Table of contents Preface text 1. Concepts relate...

A brief discussion on VUE uni-app's commonly used APIs

Table of contents 1. Routing and page jump 2. Int...