Recently, I want to build a hadoop test cluster in my company, so I use docker to quickly deploy the hadoop cluster. 0. Write in front There are already many tutorials on the Internet, but there are many pitfalls in them. Here I will record my own installation process. Objective: Use Docker to build a cluster of Hadoop 2.7.7 with one master and two slaves. Prepare: First, you need a centos7 machine with more than 8G memory. I use Alibaba Cloud host. Secondly, upload the jdk and hadoop packages to the server. I installed hadoop2.7.7. The package is ready for everyone, link: https://pan.baidu.com/s/15n_W-1rqOd2cUzhfvbkH4g extraction code: vmzw. 1. Steps It can be roughly divided into the following steps:
1.1 Install Docker Follow the steps below to install Docker. If you have a Docker environment, you can skip this step. yum update yum install -y yum-utils device-mapper-persistent-data lvm2 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum install -y docker-ce systemctl start docker docker -v 1.2 Basic Environment Preparation 1.2.1 Create a basic centos7 image and pull the official centos7 image docker pull centos Generate a centos image with ssh function by building Dockfile Create a Dockerfile vi Dockerfile Write the following content into Dockerfile FROM centos MAINTAINER mwf RUN yum install -y openssh-server sudo RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config RUN yum install -y openssh-clients RUN echo "root:qwe123" | chpasswd RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key RUN mkdir /var/run/sshd EXPOSE 22 CMD ["/usr/sbin/sshd", "-D"] The above content probably means: based on the centos image, set the password to wqe123, install the ssh service and start it Building the Dockerfile docker build -t="centos7-ssh" . A mirror named 1.2.2 Generate an image with hadoop and jdk environment
A Dockerfile has just been created, so let’s move it out of the way. Create Dockerfile vi Dockerfile Write the following: FROM centos7-ssh ADD jdk-8u202-linux-x64.tar.gz /usr/local/ RUN mv /usr/local/jdk1.8.0_202 /usr/local/jdk1.8 ENV JAVA_HOME /usr/local/jdk1.8 ENV PATH $JAVA_HOME/bin:$PATH ADD hadoop-2.7.7.tar.gz /usr/local RUN mv /usr/local/hadoop-2.7.7 /usr/local/hadoop ENV HADOOP_HOME /usr/local/hadoop ENV PATH $HADOOP_HOME/bin:$PATH RUN yum install -y which sudo The above content roughly means: based on the centos7-ssh generated above, put the hadoop and jdk packages in, and then configure the environment variables. Building the Dockerfile docker build -t="hadoop" . A mirror named hadoop will be generated 1.3 Configure the network and start the docker container Because the clusters must be connected via the network, the network must be configured first. Creating a network docker network create --driver bridge hadoop-br The above command creates a bridge network named Specify the network when starting Docker docker run -itd --network hadoop-br --name hadoop1 -p 50070:50070 -p 8088:8088 hadoop docker run -itd --network hadoop-br --name hadoop2 hadoop docker run -itd --network hadoop-br --name hadoop3 hadoop The above command starts three machines, the network is specified as Check network status docker network inspect hadoop-br Execute the above command to see the corresponding network information: [ { "Name": "hadoop-br", "Id": "88b7839f412a140462b87a353769e8091e92b5451c47b5c6e7b44a1879bc7c9a", "Containers": { "86e52eb15351114d45fdad4462cc2050c05202554849bedb8702822945268631": { "Name": "hadoop1", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" }, "9baa1ff183f557f180da2b7af8366759a0d70834f43d6b60fba2e64f340e0558": { "Name": "hadoop2", "IPv4Address": "172.18.0.3/16", "IPv6Address": "" }, "e18a3166e965a81d28b4fe5168d1f0c3df1cb9f7e0cbe0673864779b224c8a7f": { "Name": "hadoop3", "IPv4Address": "172.18.0.4/16", "IPv6Address": "" } }, } ] We can find out the IP addresses of the three machines: 172.18.0.2 hadoop1 172.18.0.3 hadoop2 172.18.0.4 hadoop3 Log in to the docker container and you can ping each other. docker exec -it hadoop1 bash docker exec -it hadoop2 bash docker exec -it hadoop3 bash 1.4 Configure host and ssh password-free login 1.4.1 Configuring the host Modify the host of each machine separately vi /etc/hosts Write the following content (Note: the IP allocated by Docker may be different for each person, fill in your own): 172.18.0.2 hadoop1 172.18.0.3 hadoop2 172.18.0.4 hadoop3 1.4.2 SSH password-free login Because the ssh service has been installed in the image above, execute the following commands directly on each machine: ssh-keygen Press Enter all the way ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop1 Enter the password, if mine is qwe123 ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop2 Enter the password, if mine is qwe123 ssh-copy-id -i /root/.ssh/id_rsa -p 22 root@hadoop3 Enter the password, if mine is qwe123 1.4.3 Test whether the configuration is successful ping hadoop1 ping hadoop2 ping hadoop3 ssh hadoop1 ssh hadoop2 ssh hadoop3 1.5 Install and configure Hadoop 1.5.1 Operation on hadoop1 Enter hadoop1 docker exec -it hadoop1 bash Create some folders, which will be used in the configuration later mkdir /home/hadoop mkdir /home/hadoop/tmp /home/hadoop/hdfs_name /home/hadoop/hdfs_data Switch to the hadoop configuration directory cd $HADOOP_HOME/etc/hadoop/ Edit core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> Edit hdfs-site.xml <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hdfs_name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/hdfs_data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop1:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> Edit mapred-site.xml mapred-site.xml does not exist by default. To do this, execute <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop1:19888</value> </property> Edit yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop1:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop1:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop1:8088</value> </property> Edit slaves Here I use hadoop1 as the master node and hadoop2 and 3 as slave nodes hadoop2 hadoop3 Copy the file to hadoop2 and hadoop3 Execute the following commands in sequence: scp -r $HADOOP_HOME/hadoop2:/usr/local/ scp -r $HADOOP_HOME/hadoop3:/usr/local/ scp -r /home/hadoop hadoop2:/ scp -r /home/hadoop hadoop3:/ 1.5.2 Operation on each machine Connect to each machine separately docker exec -it hadoop1 bash docker exec -it hadoop2 bash docker exec -it hadoop3 bash Configure the environment variables of the hadoop sbin directory Because the hadoop bin directory was configured when the image was created before, but the sbin directory was not, it needs to be configured separately. Assign configuration to each machine: vi ~/.bashrc Append the following content: export PATH=$PATH:$HADOOP_HOME/sbin implement: source ~/.bashrc 1.5.3 Start Hadoop Execute the following command on hadoop1: Formatting hdfs hdfs namenode -format One-click start start-all.sh If you don't make any mistakes, you can celebrate. If you make a mistake, come on. 1.6 Testing using hadoopjps #hadoop1 1748 Jps 490 NameNode 846 ResourceManager 686 SecondaryNameNode #hadoop2 400 DataNode 721 Jps 509 NodeManager #hadoop3 425 NodeManager 316 DataNode 591 Jps Upload files hdfs dfs -mkdir /mwf echo hello > a.txt hdfs dfs -put a.txt /mwf hdfs dfs -ls /mwf Found 1 items drwxr-xr-x - root supergroup 0 2020-09-04 11:14 /mwf Since it is a cloud server, I don’t want to configure the port, so I won’t look at the UI interface. 2. Finally The above is the process I summarized after the successful installation. There should be no problems, but there may be omissions. 3. References https://cloud.tencent.com/developer/article/1084166 https://cloud.tencent.com/developer/article/1084157?from=10680 https://blog.csdn.net/ifenggege/article/details/108396249 This is the end of this article about the detailed tutorial on how to deploy a Hadoop cluster using Docker. For more information about deploying a Hadoop cluster using Docker, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Detailed explanation of how to write mysql not equal to null and equal to null
>>: jQuery to achieve sliding stairs effect
1. Horizontal center Public code: html: <div c...
Recently, when working on mobile pages, inline-bl...
I. Overview When writing HTML templates, spaces a...
This article shares with you how to use JavaScrip...
Table of contents background Why error handling? ...
Table of contents 1. typeof 2. instanceof 3. Cons...
CI/CD Overview CI workflow design Git code versio...
CSS background image flickering bug in IE6 (backg...
Table of contents Basic Overview Enable GTID onli...
<br />Original address: http://andymao.com/a...
This question is a discussion among netizens in a...
Anyone who has worked on a large system knows tha...
We better start paying attention, because HTML Po...
Mysql left join is invalid and how to use it When...
Table of contents 1. Scenario description: 2. Cas...