Hadoop Cluster Installation Document

This document shows my experience on following apache document titled “Hadoop Cluster Setup” [1] which is for Hadoop version 3.0.0-Alpha2. This document is successor to Hadoop Installation Document-Standalone [2].

“ubuntul_hadoop_master” machine is used in the rest of the machine. You will need to read and follow Hadoop Installation Document-Standalone [2] before reading any further.

A. Prepare the guest environments for slave nodes.

It is easy to clone virtual machines using Virtualbox. Right click “ubuntul_hadoop_master” and clone. Name new VM as “ubuntul_hadoop_slave1”. You can have as many slaves as you like.
Since we simply clone the master machine, much of the configuration comes ready. Practically slave nodes needs more disk space while master node needs more memory. But this is an educational setup and these details are not necessary.

B. Install Hadoop

Hadoop comes installed with "ubuntul_hadoop_master”.

C. Running Cluster

In master node, it is enough to use utility scripts to start Hadoop cluster. First thing to do is creating etc/hadoop/workers file, which replaces etc/hadoop/slaves files in older versions of Hadoop. Each row of workers file must correspond to a slave node IP address. Also, it is needed to configure SSH trusted access. See [2].

Before starting format HDFS

  $ bin/hdfs namenode -format cluster1

For starting HDFS use following command. It is recommended to run as user hdfs.

  $ sbin/start-dfs.sh

For starting yarn use following command. It is recommended to run as user yarn.

  $ sbin/start-yarn.sh

If proxy server is present, run following command to start WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.

  $ bin/yarn --daemon start proxyserver

If a job history server is present, run following command to start JobHistory server as user mapred.

  $ bin/mapred--daemon start historyserver

D. Stopping Cluster

For stopping HDFS use following command. It is recommended to run as user hdfs.

  $ sbin/stop-dfs.sh

For stopping yarn use following command. It is recommended to run as user yarn.

  $ sbin/stop-yarn.sh

If proxy server is present, run following command to stop WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.

  $ bin/yarn --daemon stop proxyserver

If a job history server is present, run following command to stop JobHistory server as user mapred.

  $ bin/mapred--daemon stop historyserver

Huseyin ABANOZ

Search This Blog