How to run Hadoop and create images in Docker

How to run Hadoop and create images in Docker

Reinventing the wheel, here we use repackaging to generate a Docker-based Hadoop image;

The software that Hadoop cluster depends on are: jdk, ssh, etc., so as long as these two items and Hadoop related packages are packaged into the image;

Configuration file preparation

1. Hadoop related configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh
2. ssh configuration file: ssh_config
3. Hadoop cluster startup file: start-hadoop.sh

Make an image

1. Installation dependencies

RUN apt-get update && \
 apt-get install -y openssh-server openjdk-8-jdk wget

2. Download Hadoop package

RUN wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz && \
tar -xzvf hadoop-2.10.0.tar.gz && \
mv hadoop-2.10.0 /usr/local/hadoop && \
rm hadoop-2.10.0.tar.gz && \
rm /usr/local/hadoop/share/doc -rf

3. Configure environment variables

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 
ENV HADOOP_HOME=/usr/local/hadoop 
ENV PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

4. Generate SSH key for password-free node login

RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

5. Create Hadoop related directories, copy related configuration files, add execution permissions to related files, and finally format the namenode node. When each node starts, start the ssh service;

RUN mkdir -p ~/hdfs/namenode && \ 
mkdir -p ~/hdfs/datanode && \
mkdir $HADOOP_HOME/logs
COPY config/* /tmp/
#Copy ssh, hadoop configuration related RUN mv /tmp/ssh_config ~/.ssh/config && \
mv /tmp/hadoop-env.sh /usr/local/hadoop/etc/hadoop/hadoop-env.sh && \
mv /tmp/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml && \ 
mv /tmp/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml && \
mv /tmp/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml && \
mv /tmp/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml && \
mv /tmp/slaves $HADOOP_HOME/etc/hadoop/slaves && \
mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \
mv /tmp/run-wordcount.sh ~/run-wordcount.sh
#Add execution permission RUN chmod +x ~/start-hadoop.sh && \
chmod +x ~/run-wordcount.sh && \
chmod +x $HADOOP_HOME/sbin/start-dfs.sh && \
chmod +x $HADOOP_HOME/sbin/start-yarn.sh 
#format namenode
RUN /usr/local/hadoop/bin/hdfs namenode -format 


Running Hadoop Cluster in Docker

After the image is generated through the above Dockerfile, you can use the image generated above to build a Hadoop cluster; here start a master and two slave nodes;

Add a bridge network:

docker network create --driver=bridge solinx-hadoop

Start the Master node:

docker run -itd --net=solinx-hadoop -p 10070:50070 -p 8088:8088 --name solinx-hadoop-master --hostname solinx-hadoop-master solinx/hadoop:0.1

Start the Slave1 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave1 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Start the Slave2 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave2 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Enter the Master node and execute the script to start the Hadoop cluster:

Summarize

The above is what I introduced to you about running Hadoop and image creation in Docker. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • Detailed example of Hadoop multi-job parallel processing
  • Detailed explanation of common hadoop errors and solutions
  • How to configure Hadoop to use IntelliJ IDEA for remote debugging code
  • Detailed tutorial on how to use Hadoop integrated with Spring (quick start with big data)
  • Detailed method of using IDEA to build Hadoop development environment under Windows
  • Teach you how to build a Hadoop 3.x pseudo cluster on Tencent Cloud
  • CentOS 7 builds hadoop 2.10 high availability (HA)
  • Teach you how to use hadoop to extract specified content from a file

<<:  Summary of MySQL commonly used type conversion functions (recommended)

>>:  Tencent Interview: What are the reasons why a SQL statement executes very slowly? ---Don't watch the regret series (recommended)

Recommend

Centos7.5 installs mysql5.7.24 binary package deployment

1. Environmental preparation: Operating system: C...

Detailed explanation of MySQL 8.0.18 commands

Open the folder C:\web\mysql-8.0.11 that you just...

MySQL optimization tutorial: large paging query

Table of contents background LIMIT Optimization O...

What kinds of MYSQL connection queries do you know?

Preface If the query information comes from multi...

Summary of Mysql update multi-table joint update method

Next, I will create two tables and execute a seri...

CSS3 frosted glass effect

If the frosted glass effect is done well, it can ...

Detailed tutorial for installing ElasticSearch:7.8.0 cluster with docker

ElasticSearch cluster supports動態請求的方式and靜態配置文件to ...

Detailed explanation of Linux mpstat command usage

1. mpstat command 1.1 Command Format mpstat [ -A ...

Detailed explanation of using split command to split Linux files

A few simple Linux commands let you split and rea...

11 Reasons Why Bootstrap Is So Popular

Preface Bootstrap, the most popular front-end dev...

In-depth analysis of MySQL indexes

Preface We know that index selection is the work ...

MySQL paging performance exploration

Several common paging methods: 1. Escalator metho...

Sample code for implementing menu permission control in Vue

When people are working on a backend management s...