How to run Hadoop and create images in Docker

Reinventing the wheel, here we use repackaging to generate a Docker-based Hadoop image;

The software that Hadoop cluster depends on are: jdk, ssh, etc., so as long as these two items and Hadoop related packages are packaged into the image;

Configuration file preparation

1. Hadoop related configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh
2. ssh configuration file: ssh_config
3. Hadoop cluster startup file: start-hadoop.sh

Make an image

1. Installation dependencies

RUN apt-get update && \
 apt-get install -y openssh-server openjdk-8-jdk wget

2. Download Hadoop package

RUN wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz && \
tar -xzvf hadoop-2.10.0.tar.gz && \
mv hadoop-2.10.0 /usr/local/hadoop && \
rm hadoop-2.10.0.tar.gz && \
rm /usr/local/hadoop/share/doc -rf

3. Configure environment variables

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 
ENV HADOOP_HOME=/usr/local/hadoop 
ENV PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

4. Generate SSH key for password-free node login

RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

5. Create Hadoop related directories, copy related configuration files, add execution permissions to related files, and finally format the namenode node. When each node starts, start the ssh service;

RUN mkdir -p ~/hdfs/namenode && \ 
mkdir -p ~/hdfs/datanode && \
mkdir $HADOOP_HOME/logs
COPY config/* /tmp/
#Copy ssh, hadoop configuration related RUN mv /tmp/ssh_config ~/.ssh/config && \
mv /tmp/hadoop-env.sh /usr/local/hadoop/etc/hadoop/hadoop-env.sh && \
mv /tmp/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml && \ 
mv /tmp/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml && \
mv /tmp/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml && \
mv /tmp/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml && \
mv /tmp/slaves $HADOOP_HOME/etc/hadoop/slaves && \
mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \
mv /tmp/run-wordcount.sh ~/run-wordcount.sh
#Add execution permission RUN chmod +x ~/start-hadoop.sh && \
chmod +x ~/run-wordcount.sh && \
chmod +x $HADOOP_HOME/sbin/start-dfs.sh && \
chmod +x $HADOOP_HOME/sbin/start-yarn.sh 
#format namenode
RUN /usr/local/hadoop/bin/hdfs namenode -format

Running Hadoop Cluster in Docker

After the image is generated through the above Dockerfile, you can use the image generated above to build a Hadoop cluster; here start a master and two slave nodes;

Add a bridge network:

docker network create --driver=bridge solinx-hadoop

Start the Master node:

docker run -itd --net=solinx-hadoop -p 10070:50070 -p 8088:8088 --name solinx-hadoop-master --hostname solinx-hadoop-master solinx/hadoop:0.1

Start the Slave1 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave1 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Start the Slave2 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave2 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Enter the Master node and execute the script to start the Hadoop cluster:

Summarize

The above is what I introduced to you about running Hadoop and image creation in Docker. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:

Detailed example of Hadoop multi-job parallel processing
Detailed explanation of common hadoop errors and solutions
How to configure Hadoop to use IntelliJ IDEA for remote debugging code
Detailed tutorial on how to use Hadoop integrated with Spring (quick start with big data)
Detailed method of using IDEA to build Hadoop development environment under Windows
Teach you how to build a Hadoop 3.x pseudo cluster on Tencent Cloud
CentOS 7 builds hadoop 2.10 high availability (HA)
Teach you how to use hadoop to extract specified content from a file

<<: Summary of MySQL commonly used type conversion functions (recommended)

>>: Tencent Interview: What are the reasons why a SQL statement executes very slowly? ---Don't watch the regret series (recommended)

Detailed explanation of mysql transaction management operations

How to run Hadoop and create images in Docker

Detailed explanation of mysql transaction management operations

Research on Web Page Size

How to optimize MySQL index function based on Explain keyword

Various correct postures for using environment variables in Webpack

A brief discussion on the three major issues of JS: asynchrony and single thread

Linux implements automatic and scheduled backup of MySQL database every day

HTML uses canvas to implement bullet screen function

Detailed explanation of how two Node.js processes communicate

js implements form validation function

What are Web Slices?

Recommend

Summary of Mysql exists usage

Detailed explanation of EXT series file system formats in Linux

Native JS to achieve book flipping effects

MySQL 8.0.16 compressed package installation and configuration method graphic tutorial

mysql-canal-rabbitmq installation and deployment super detailed tutorial

JavaScript navigator.userAgent obtains browser information case explanation

A brief discussion on CSS blocking merging and other effects

JavaScript message box example

Comparative Analysis of IN and Exists in MySQL Statements

mysql zip file installation tutorial

How to allow all hosts to access mysql

Detailed explanation of the use of MySQL sql_mode

Solve the MySQL 5.7.9 version sql_mode=only_full_group_by problem

Implementation of adding visit count function in github+Jekyll blog in one minute with JS

Vue implements simple calculator function