Detailed steps to install Hadoop cluster under Linux

Detailed steps to install Hadoop cluster under Linux

1. Create a Hadoop directory in the usr directory, import the installation package into the directory and decompress the file

2. Enter the vim /etc/profile file and edit the configuration file

#hadoop
export HADOOP_HOME=/usr/hadoop/hadoop-2.6.0
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
export PATH=$PATH:$HADOOP_HOME/bin 

3. Make the file effective

source /etc/profile 

4. Enter the Hadoop directory

cd /usr/hadoop/hadoop-2.6.0/etc/hadoop 

5. Edit the configuration file

(1) Enter vim hadoop-env.sh file and add (the location of the java jdk file)

export JAVA_HOME=/usr/java/jdk1.8.0_181 

(2) Enter vim core-site.xml (z1: the IP or mapping name of the master node (change it to your own))

<configuration>
<property>
        <name>hadoop.tmp.dir</name>
        <value>file:/root/hadoop/tmp</value>
</property>
<!--Port number 9000-->
<property>
        <name>fs.default.name</name>
        <value>hdfs://z1:9000</value>
</property>
<!--Turn on the trash can mechanism in minutes-->
<property>
    <name>fs.trash.insterval</name>
    <value>10080</value>
</property>
<!--Buffer size, actual work depends on server performance-->
<property>
    <name>io.file.buffer.sizei</name>
    <value>4096</value>
</property>
</configuration>
                                                                                                                                                                  39,9 bottom 

(3) Hadoop does not have a mapred-site.xml file. Copy the file here and then enter mapred-site.xml

cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

(z1: the IP or mapping name of the master node (change to your own))

<configuration>
<property>
<!--Specify Mapreduce to run on yarn-->
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
<!--Start MapReduce's small task mode-->
<property>
      <name>mapred.job.ubertask.enable</name>
      <value>true</value>
</property>
<property>
      <name>mapred.job.tracker</name>
      <value>z1:9001</value>
</property>
 
<property>
<name>mapreduce.jobhistory.address</name>
<value>CMaster:10020</value>
</property>
</configuration> 

(4) Enter yarn-site.xml

vim yarn-site.xml

(z1: the IP or mapping name of the master node (change to your own))

<configuration>
 
<!-- Site specific YARN configuration properties -->
 
<!--Configure the location of the yarn master node-->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>z1</value>
</property>
<property>
<!-- mapreduce, the way to get data when executing shuff1e.-->
<description>The address of the applications manager interface in the RM.</description>
     <name>yarn.resourcemanager.address</name>
     <value>z1:8032</value>
</property>
<property>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>z1:8030</value>
</property>
 
<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>z1:8088</value>
</property>
 
<property>
  <name>yarn.resourcemanager.webapp.https.address</name>
  <value>z1:8090</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>z1:8031</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>z1:8033</value>
</property>
<property><!--The way to get data when mapreduce executes shuff1e, -->
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
<!--Set memory, memory allocation of yarn-->
  <name>yarn.scheduler.maximum-a11ocation-mb</name>
  <value>2024</value>
  <discription>Available memory per node, unit: M, default: 8182MB</discription>
</property>
<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>2.1</value>
</property>
<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>1024</value>
</property>
<property>
  <name>yarn.nodemanager.vmem-check-enabled</name>
  <value>false</value>
</property>
 
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
 
 
</configuration>
                                                    

(5) Enter hdfs-site.xml

vim hdfs-site.xml 

<configuration>
<property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/hadoop/hadoop-2.6.0/hadoopDesk/namenodeDatas</value>
</property>
 <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/hadoop/hadoop-2.6.0/hadoopDatas/namenodeDatas</value>
    </property>
<property>
<!--Number of copies-->
<name>dfs.replication</name>
<value>3</value>
</property>
<!--Set hdfs file permissions-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!--Set the size of a file slice: 128m-->
<property>
<name>dfs.bloksize</name>
<value>134217728</value>
</property>
</configuration>

6. Enter slaves to add master nodes and slave nodes

vim slaves

Add your own master node and slave node (mine are z1, z2, z3)

7. Copy each file to another virtual machine

scp -r /etc/profile root@z2:/etc/profile #Distribute the environment variable profile file to the z2 node scp -r /etc/profile root@z3:/etc/profile #Distribute the environment variable profile file to the z3 node scp -r /usr/hadoop root@z2:/usr/ #Distribute the hadoop file to the z2 node scp -r /usr/hadoop root@z3:/usr/ #Distribute the hadoop file to the z3 node

The environment variables of the two slave nodes take effect

source /etc/profile

8.Format Hadoop (operate only in the master node)

First check whether jps has started hadoop

hadoop namenode -format

When you see Exiting with status 0, it means the formatting is successful.

9. Return to the Hadoop directory (operate only on the master node)

cd /usr/hadoop/hadoop-2.6.0
sbin/start-all.sh starts Hadoop and operates only on the master node 

The effect of inputting jps on the master node is as follows:

The effect of inputting jps from the node:

This is the end of this article about the detailed steps of installing Hadoop cluster under Linux. For more relevant content about installing Hadoop cluster under Linux, please search for previous articles on 123WORDPRESS.COM or continue to browse the related articles below. I hope everyone will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • Hadoop 2.7.3 installation and setup process under Linux
  • Detailed graphic explanation of hadoop installation and configuration based on Linux7
  • How to install the standalone version of spark in linux environment without using hadoop
  • Steps to build Hadoop service in Centos7 in Linux
  • Detailed steps to install and configure hadoop cluster in Linux
  • Sharing the steps of building a hadoop environment under Linux
  • Detailed explanation of installing Hadoop true distributed cluster on Linux system

<<:  JavaScript to implement voice queuing system

>>:  MySQL uses the Partition function to implement horizontal partitioning strategy

Recommend

MySQL database master-slave replication and read-write separation

Table of contents 1. Master-slave replication Mas...

Four ways to combine CSS and HTML

(1) Each HTML tag has an attribute style, which c...

Detailed explanation of how to create an updateable view in MySQL

This article uses an example to describe how to c...

Detailed process of zabbix monitoring process and port through agent

Environment Introduction Operating system: centos...

MySQL character set garbled characters and solutions

Preface A character set is a set of symbols and e...

Some key points of website visual design

From handicraft design to graphic design to web de...

HTML Tutorial: title attribute and alt attribute

XHTML is the basis of CSS layout. jb51.net has al...

How to change MySQL character set utf8 to utf8mb4

For MySQL 5.5, if the character set is not set, t...

MySQL primary key naming strategy related

Recently, when I was sorting out the details of d...

Pay attention to the use of HTML tags in web page creation

HTML has attempted to move away from presentation...