1. Environmental Preparation CentOS Linux release 7.5.1804 (Core) Install Create a folder $ cd /home/centos $ mkdir software $ mkdir module Import the installation package into the software folder $ cd software # Then drag the file in The installation package used here is /home/centos/software/hadoop-3.1.3.tar.gz /home/centos/software/jdk-8u212-linux-x64.tar.gz $ tar -zxvf jdk-8u212-linux-x64.tar.gz -C ../module $ tar -zxvf hadoop-3.1.3.tar.gz -C ../module Configuring environment variables $ cd /etc/profile.d/ $ vim my_env.sh In order not to pollute system variables, we create an environment variable script ourselves, the configuration content is as follows #JAVA_HOME,PATH # export is promoted to a global variable. If your path is different from mine, remember to use your own path here export JAVA_HOME=/home/centos/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin #HADOOP_HOME export HADOOP_HOME=/home/centos/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin Then save and exit (if you don’t know how to use vim, you can read the basic usage of vim, I won’t go into details here). Let's source it to make the environment variables take effect $ source /etc/profile Test it to see if it succeeds $ hadoop version $ java If the above interface appears, there is no problem. If it is still not successful, you can do the following two checks:
ssh without password Although it is a pseudo cluster, a password is still required when the local machine connects to the local machine, so you need to set up ssh password-free $ ssh-keygen -t rsa Just keep pressing Enter when the prompt appears. After generating the secret key $ ssh-copy-id local hostname Configure hosts file vi /etc/hosts #The configuration I keep here is that the master is configured with the intranet of Tencent Cloud. If the external network is configured, the eclipse client will not be able to connect to hadoop ::1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 172.16.0.3 master 127.0.0.1 localhost Modify the host name vi /etc/sysconfig/network #Change HOSTNAME to master HOSTNAME=master Modify hostname $ hostnamectl --static set-hostname master Turn off firewall $ systemctl disable firewalld #Permanent 2. Configure Hadoop Configuration Files Enter the hadoop configuration file area, all configuration files are in this folder $ cd /home/centos/module/hadoop-3.1.3/etc/hadoop The files we want to configure are mainly core-site.xml
hdfs-site.xml
yarn-site.xml
Then just follow the steps! $ vim core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Tencent Cloud intranet ip address:9820</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/centos/module/hadoop-3.1.3/data/tmp</value> </property> <!-- Permissions to operate HDFS through the web interface --> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> <!-- Hive compatibility configuration later --> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration> $ vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>Tencent Cloud intranet IP address: 9868</value> </property> </configuration> $ vim hadoop-env.sh export JAVA_HOME=/home/centos/module/jdk1.8.0_212 $ vim yarn-site.xml <configuration> <!-- Reducer obtains data --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- Specify the address of YARN's ResourceManager --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <!-- Environment variables are inherited from the NodeManagers' container environment properties. For mapreduce applications, in addition to the default value hadoop op_mapred_home should be added. The attribute values are as follows --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <!-- Solve the problem that Yarn exceeds the virtual memory limit when executing the program and the Container is killed --> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <!-- Hive compatibility configuration later --> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <!-- Enable log aggregation--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- Access path --> <property> <name>yarn.log.server.url</name> <value>http://172.17.0.13:19888/jobhistory/logs</value> </property> <!-- Save for 7 days --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration> Configuring the History Server $ vim mapred-site.xml <!-- History server address--> <property> <name>mapreduce.jobhistory.address</name> <value>Tencent Cloud intranet ip:10020</value> </property> <!-- History server web address--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>Tencent Cloud intranet ip:19888</value> </property> initialization The NameNode needs to be formatted for the first startup, but it is not required afterwards. $ hdfs namenode -format After initialization, you can see that two folders, data and logs, appear in the hadoop installation folder, which means that the initialization is successful. Next, let’s start the cluster. $ start-dfs.sh Startup completed, no abnormal information, check the process [root@VM_0_13_centos hadoop]# jps 20032 Jps 30900 DataNode 31355 SecondaryNameNode 30559 NameNode All started successfully~! One-click start If all the above are OK, you can create a script to start the cluster with one click and create a new one in the bin directory. $ vim mycluster Add the following content #!/bin/bash case $1 in "start") #dfs yarn history start-dfs.sh start-yarn.sh mapred --daemon start historyserver ;; "stop") # dfs yarn history stop-dfs.sh stop-yarn.sh mapred --daemon stop historyserver ;; *) echo "args is error! please input start or stop" ;; esac Configure script permissions $ chmod u+x mycluster Start using script $ mycluster start $ jps 23680 NodeManager 24129 JobHistoryServer 22417 DataNode 24420 Jps 22023 NameNode 23384 ResourceManager 22891 SecondaryNameNode 3. View hdfs Configure security group rules Before performing the following operations, add the following ports to be used in the protocol port in the security group rules: Port Number:
hadoop web page Enter the following in the browser: We found that the Secondary NameNode interface display was not normal. This was due to the incorrect use of the time function of dfs-dust.js in hadoop3. Let’s correct it manually. First shut down the cluster $ mycluster stop Modify the file $ vim /home/centos/module/hadoop-3.1.3/share/hadoop/hdfs/webapps/static/dfs-dust.js At about line 61, as shown in the figure, change to: return new Date(Number(v)).toLocaleString(); Now we restart the cluster $ mycluster start You can see that the web interface of Secondary NameNode is normal. Testing HDFS Let's upload the file and have some fun. Create a new folder in the hadoop directory $ mkdir temdatas Enter the folder and create a new test file $ vim text.txt Just write whatever you want, save it, and then we can start uploading files. $ hdfs dfs -put text.txt / Check the web page and upload successfully~ Try downloading this file again $ hdfs dfs -get /text.txt ./text1.txt Success~ WordCount Case Study Create a new folder input on the web Upload a file of various words you wrote and do word statistics #Or you can write it in vim and upload it yourself $ hdfs dfs -put wordcount.txt /input Then test the wordcount case. Note that the output folder cannot exist. $ hadoop jar /home/centos/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output After running, let's take a look at the results #Pull hdfs file [root@master mydata]# hdfs dfs -get /output ./ # View the results [root@master output]# cat part-r-00000 a 2 b 3 c 2 d 1 e 1 f 1 At this point, you can play around with Hadoop freely. Of course, if you have tried it, you will find that there is still a small problem that has not been solved. When you click on the file on the web to view the head or tail, it will not be able to be viewed, and downloading is also not possible. This did not happen when the virtual machine was installed, and I am still investigating what happened. If anyone knows what's going on, please leave a message. This is the end of this article on how to build a Hadoop 3.x pseudo cluster on Tencent Cloud. For more information about how to build a Hadoop 3.x pseudo cluster on Tencent Cloud, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope you will support 123WORDPRESS.COM in the future! You may also be interested in:
|
<<: Vue realizes the progress bar change effect
>>: MySQL transaction concepts and usage in-depth explanation
Scenario 1: To achieve a semi-transparent border:...
Abstract: HBase comes with many operation and mai...
By default, setting width for label and span is in...
The web pinball game implemented using javeScript...
Automatic web page refresh: Add the following code...
Inside the style tag of the vue component, there ...
Deployment environment: Installation version red ...
Table of contents Preface Function Overloading Ma...
Table of contents 1. What is block scope? 2. Why ...
This article example shares the simple implementa...
1.17.9 More delicious, really Nginx download addr...
Table of contents react-native project initializa...
I had been working on the project before the New ...
In the process of product design, designers always...
Prerequisites Compose is a tool for orchestrating...