This document shows my experience on following apache document titled “Hadoop Cluster Setup” [1] which is for Hadoop version 3.0.0-Alpha2. This document is successor to Hadoop Installation Document-Standalone [2].
“ubuntul_hadoop_master” machine is used in the rest of the machine. You will need to read and follow Hadoop Installation Document-Standalone [2] before reading any further.
It is easy to clone virtual machines using Virtualbox. Right click “ubuntul_hadoop_master” and clone. Name new VM as “ubuntul_hadoop_slave1”. You can have as many slaves as you like.
Since we simply clone the master machine, much of the configuration comes ready. Practically slave nodes needs more disk space while master node needs more memory. But this is an educational setup and these details are not necessary.
“ubuntul_hadoop_master” machine is used in the rest of the machine. You will need to read and follow Hadoop Installation Document-Standalone [2] before reading any further.
A. Prepare the guest environments for slave nodes.
It is easy to clone virtual machines using Virtualbox. Right click “ubuntul_hadoop_master” and clone. Name new VM as “ubuntul_hadoop_slave1”. You can have as many slaves as you like.
Since we simply clone the master machine, much of the configuration comes ready. Practically slave nodes needs more disk space while master node needs more memory. But this is an educational setup and these details are not necessary.
B. Install Hadoop
Hadoop comes installed with "ubuntul_hadoop_master”.
C. Running Cluster
In master node, it is enough to use utility scripts to start Hadoop cluster. First thing to do is creating etc/hadoop/workers file, which replaces etc/hadoop/slaves files in older versions of Hadoop. Each row of workers file must correspond to a slave node IP address. Also, it is needed to configure SSH trusted access. See [2].
- Before starting format HDFS
- For starting HDFS use following command. It is recommended to run as user hdfs.
- For starting yarn use following command. It is recommended to run as user yarn.
- If proxy server is present, run following command to start WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.
- If a job history server is present, run following command to start JobHistory server as user mapred.
$ bin/hdfs namenode -format cluster1
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
$ bin/yarn --daemon start proxyserver
$ bin/mapred--daemon start historyserver
D. Stopping Cluster
- For stopping HDFS use following command. It is recommended to run as user hdfs.
- For stopping yarn use following command. It is recommended to run as user yarn.
- If proxy server is present, run following command to stop WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.
- If a job history server is present, run following command to stop JobHistory server as user mapred.
$ sbin/stop-dfs.sh
$ sbin/stop-yarn.sh
$ bin/yarn --daemon stop proxyserver
$ bin/mapred--daemon stop historyserver
Comments
Post a Comment