Skip to main content

Hadoop Cluster Installation Document

This document shows my experience on following apache document titled “Hadoop Cluster Setup” [1] which is for Hadoop version 3.0.0-Alpha2. This document is successor to Hadoop Installation Document-Standalone [2].

ubuntul_hadoop_master” machine is used in the rest of the machine. You will need to read and follow Hadoop Installation Document-Standalone [2] before reading any further.

A. Prepare the guest environments for slave nodes.


It is easy to clone virtual machines using Virtualbox. Right click “ubuntul_hadoop_master” and clone. Name new VM as “ubuntul_hadoop_slave1”. You can have as many slaves as you like.
Since we simply clone the master machine, much of the configuration comes ready. Practically slave nodes needs more disk space while master node needs more memory. But this is an educational setup and these details are not necessary.

B. Install Hadoop


Hadoop comes installed with "ubuntul_hadoop_master”.

C. Running Cluster


In master node, it is enough to use utility scripts to start Hadoop cluster. First thing to do is creating etc/hadoop/workers file, which replaces etc/hadoop/slaves files in older versions of Hadoop. Each row of workers file must correspond to a slave node IP address. Also, it is needed to configure SSH trusted access. See [2].
  1. Before starting format HDFS

  2.   $ bin/hdfs namenode -format cluster1
    

  3. For starting HDFS use following command. It is recommended to run as user hdfs.

  4.   $ sbin/start-dfs.sh
    

  5. For starting yarn use following command. It is recommended to run as user yarn.

  6.   $ sbin/start-yarn.sh
    

  7. If proxy server is present, run following command to start WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.

  8.   $ bin/yarn --daemon start proxyserver
    

  9. If a job history server is present, run following command to start JobHistory server as user mapred.

  10.   $ bin/mapred--daemon start historyserver
    

D. Stopping Cluster


  1. For stopping HDFS use following command. It is recommended to run as user hdfs.

  2.   $ sbin/stop-dfs.sh

  3. For stopping yarn use following command. It is recommended to run as user yarn.

  4.   $ sbin/stop-yarn.sh

  5. If proxy server is present, run following command to stop WebAppProxy server as user yarn. If there are multiple WebAppProxy servers, run this command on each of them.

  6.   $ bin/yarn --daemon stop proxyserver
    

  7. If a job history server is present, run following command to stop JobHistory server as user mapred.

  8.   $ bin/mapred--daemon stop historyserver

Links


  1. Apache Hadoop Cluster Setup
  2. Hadoop Installation Document Standalone

Comments

Popular posts from this blog

Obfuscating Spring Boot Projects Using Maven Proguard Plugin

Introduction Obfuscation is the act of reorganizing bytecode such that it becomes hard to decompile. Many developers rely on obfuscation to save their sensitive code from undesired eyes. Publishing jars without obfuscation may hinder competitiveness because rivals may take advantage of easily decompilable nature of java binaries. Objective Spring Boot applications make use of public interfaces, annotations which makes applications harder to obfuscate. Additionally, maven Spring Boot plugin creates a fat jar which contains all dependent jars. It is not viable to obfuscate the whole fat jar. Thus obfuscating Spring Boot applications is different than obfuscating regular java applications and requires a suitable strategy. Audience Those who use Spring Boot and Maven and wish to obfuscate their application using Proguard are the target audience for this article. Sample Application As the sample application, I will use elastic search synch application from my G...

Hadoop Installation Document - Standalone Mode

This document shows my experience on following apache document titled “Hadoop:Setting up a Single Node Cluster”[1] which is for Hadoop version 3.0.0-Alpha2 [2]. A. Prepare the guest environment Install VirtualBox. Create a virtual 64 bit Linux machine. Name it “ubuntul_hadoop_master”. Give it 500MB memory. Create a VMDK disc which is dynamically allocated up to 30GB. In network settings in first tab you should see Adapter 1 enabled and attached to “NAT”. In second table enable adapter 2 and attach to “Host Only Adaptor”. First adapter is required for internet connection. Second one is required for letting outside connect to a guest service. In storage settings, attach a Linux iso file to IDE channel. Use any distribution you like. Because of small installation size, I choose minimal Ubuntu iso [1]. In package selection menu, I only left standard packages selected.  Login to system.  Setup JDK. $ sudo apt-get install openjdk-8-jdk Install ssh and pdsh, if...

Java Thread States

Java Threads may have 6 states: new , runnable , terminated , blocked , waiting , timed_waiting . When a thread is created it is in new state. When start method of thread is called it enters runnable state. Runnable state has two inner states: ready and running . If thread is eligible for execution it is said to be ready, if it is executing it is in running state. Remember calling start method on a already started thread will raise IllegalThreadStateException. When thread finishes its execution it enters into terminated state. When a thread is trying to access a resource, a synchronized statement for example, and it is not available, lock of the object is already acquired for example, it is blocked and said to be in blocked state. When lock is released an thread has chance to acquire lock it goes back to runnable state. When a thread calls join or wait method it enters into waiting state. When joined thread finishes or for wait method notify/notifyAll metho...