How to run Hadoop and create images in Docker

How to run Hadoop and create images in Docker

Reinventing the wheel, here we use repackaging to generate a Docker-based Hadoop image;

The software that Hadoop cluster depends on are: jdk, ssh, etc., so as long as these two items and Hadoop related packages are packaged into the image;

Configuration file preparation

1. Hadoop related configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh
2. ssh configuration file: ssh_config
3. Hadoop cluster startup file: start-hadoop.sh

Make an image

1. Installation dependencies

RUN apt-get update && \
 apt-get install -y openssh-server openjdk-8-jdk wget

2. Download Hadoop package

RUN wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz && \
tar -xzvf hadoop-2.10.0.tar.gz && \
mv hadoop-2.10.0 /usr/local/hadoop && \
rm hadoop-2.10.0.tar.gz && \
rm /usr/local/hadoop/share/doc -rf

3. Configure environment variables

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 
ENV HADOOP_HOME=/usr/local/hadoop 
ENV PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

4. Generate SSH key for password-free node login

RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

5. Create Hadoop related directories, copy related configuration files, add execution permissions to related files, and finally format the namenode node. When each node starts, start the ssh service;

RUN mkdir -p ~/hdfs/namenode && \ 
mkdir -p ~/hdfs/datanode && \
mkdir $HADOOP_HOME/logs
COPY config/* /tmp/
#Copy ssh, hadoop configuration related RUN mv /tmp/ssh_config ~/.ssh/config && \
mv /tmp/hadoop-env.sh /usr/local/hadoop/etc/hadoop/hadoop-env.sh && \
mv /tmp/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml && \ 
mv /tmp/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml && \
mv /tmp/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml && \
mv /tmp/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml && \
mv /tmp/slaves $HADOOP_HOME/etc/hadoop/slaves && \
mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \
mv /tmp/run-wordcount.sh ~/run-wordcount.sh
#Add execution permission RUN chmod +x ~/start-hadoop.sh && \
chmod +x ~/run-wordcount.sh && \
chmod +x $HADOOP_HOME/sbin/start-dfs.sh && \
chmod +x $HADOOP_HOME/sbin/start-yarn.sh 
#format namenode
RUN /usr/local/hadoop/bin/hdfs namenode -format 


Running Hadoop Cluster in Docker

After the image is generated through the above Dockerfile, you can use the image generated above to build a Hadoop cluster; here start a master and two slave nodes;

Add a bridge network:

docker network create --driver=bridge solinx-hadoop

Start the Master node:

docker run -itd --net=solinx-hadoop -p 10070:50070 -p 8088:8088 --name solinx-hadoop-master --hostname solinx-hadoop-master solinx/hadoop:0.1

Start the Slave1 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave1 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Start the Slave2 node:

docker run -itd --net=solinx-hadoop --name solinx-hadoop-slave2 --hostname solinx-hadoop-slave1 solinx/hadoop:0.1

Enter the Master node and execute the script to start the Hadoop cluster:

Summarize

The above is what I introduced to you about running Hadoop and image creation in Docker. I hope it will be helpful to you. If you have any questions, please leave me a message and I will reply to you in time. I would also like to thank everyone for their support of the 123WORDPRESS.COM website!
If you find this article helpful, please feel free to reprint it and please indicate the source. Thank you!

You may also be interested in:
  • Detailed example of Hadoop multi-job parallel processing
  • Detailed explanation of common hadoop errors and solutions
  • How to configure Hadoop to use IntelliJ IDEA for remote debugging code
  • Detailed tutorial on how to use Hadoop integrated with Spring (quick start with big data)
  • Detailed method of using IDEA to build Hadoop development environment under Windows
  • Teach you how to build a Hadoop 3.x pseudo cluster on Tencent Cloud
  • CentOS 7 builds hadoop 2.10 high availability (HA)
  • Teach you how to use hadoop to extract specified content from a file

<<:  Summary of MySQL commonly used type conversion functions (recommended)

>>:  Tencent Interview: What are the reasons why a SQL statement executes very slowly? ---Don't watch the regret series (recommended)

Recommend

Summary of Mysql exists usage

Introduction EXISTS is used to check whether a su...

Detailed explanation of EXT series file system formats in Linux

Linux File System Common hard disks are shown in ...

Native JS to achieve book flipping effects

This article shares with you a book flipping effe...

mysql-canal-rabbitmq installation and deployment super detailed tutorial

Table of contents 1.1. Enable MySQL binlog 1.2. C...

JavaScript navigator.userAgent obtains browser information case explanation

The browser is probably the most familiar tool fo...

A brief discussion on CSS blocking merging and other effects

Non-orthogonal margins When margin is used, it wi...

JavaScript message box example

Three types of message boxes can be created in Ja...

Comparative Analysis of IN and Exists in MySQL Statements

Background Recently, when writing SQL statements,...

mysql zip file installation tutorial

This article shares the specific method of instal...

How to allow all hosts to access mysql

1. Change the Host field value of a record in the...

Detailed explanation of the use of MySQL sql_mode

Table of contents Preface sql_mode explained The ...

Solve the MySQL 5.7.9 version sql_mode=only_full_group_by problem

MySQL 5.7.9 version sql_mode=only_full_group_by i...

Vue implements simple calculator function

This article example shares the specific code of ...