Installing Hadoop in Pseudo Distributed Mode


The aim of this article is to provide the reader a walkthrough on how to set up a Hadoop 2.7.1 in Pseudo Distributed Mode (= Single Node Cluster) on Ubuntu 14.04. The article does not provide Hadoop basics (e.g. what is a NameNode; what is a DataNode, what is HDFS, etc.). Ok, let’s go.

Installation walkthrough

1) First of all – if it’s not already installed – install JDK 7:

root@roadrunner:/home/edi# apt-get install openjdk-7-jre-headless

2) Create a user called hadoop:

root@roadrunner:/home/edi# adduser hadoop
Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...

3) Add user hadoop to sudo group:

root@roadrunner:/home/edi# adduser hadoop sudo
Adding user `hadoop' to group `sudo' ...
Adding user hadoop to group sudo

4) Switch to user hadoop:

root@roadrunner:/home/edi# su hadoop
hadoop@roadrunner:/home/edi$ cd ~

5) Download and unpack hadoop framework:

hadoop@roadrunner:~$ wget
hadoop@roadrunner:~$ tar -zxf hadoop-2.7.1.tar.gz

6) Move unpacked framework to /user/local/:

hadoop@roadrunner:~$ sudo mv hadoop-2.7.1 /usr/local/
[sudo] password for hadoop:

7) Find out your Java home location as you will need it in next step.

8) Edit bashrc of user hadoop:

hadoop@roadrunner:~$ vi .bashrc

While in editor add this at the end of bashrc-configuration (please note that your JAVA_HOME location may be different from mine):

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre
export HADOOP_HOME=/usr/local/hadoop-2.7.1

8) Open a new terminal, switch to hadoop user and check hadoop is working by executing the version command of hadoop:

edi@roadrunner:~$ su hadoop
hadoop@roadrunner:/home/edi$ hadoop version
Hadoop 2.7.1
Subversion -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /usr/local/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar

Everything fine so far, ok let’s continue.

9) Set Java home in

hadoop@roadrunner:~$ vi $HADOOP_HOME/etc/hadoop/

While in editor search JAVA_HOME and edit accordingly (please note that your JAVA_HOME location may be different from mine):

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386/jre

10)  Edit core-site.xml. Hadoop will be configured to run localy on port 9000 using hdfs as default filesystem.

hadoop@roadrunner:~$ vi $HADOOP_HOME/etc/hadoop/core-site.xml

Change the configuration element to this:

<name> </name>

11) Edit hdfs-site.xml. Hadoop will be configured to use different storage location for NameNode and DataNode. As we have one DataNode we set the data replication factor to 1.

hadoop@roadrunner:~$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Change the configuration element  to this:


<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>

<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>

12) Edit yarn-site.xml . Configure YARN, a  framework for job scheduling and cluster resource management.

hadoop@roadrunner:~$ vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

Change the configuration element to this:


13) Copy and edit. Configure Hadoop to use YARN as application for MapReduce.

hadoop@roadrunner:~$ cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
hadoop@roadrunner:~$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

Change the configuration element to this:


14) Configuration done, format the filesystem:

hadoop@roadrunner:~$ hadoop namenode -format

You will see a bunch of info messages on the terminal when executing above command. Afterwards – if everthing is ok – a new directory „hadoopinfra“ has been created in hadoop home as specified in hdfs-site.xml:

hadoop@roadrunner:~$ ls
hadoop-2.7.1.tar.gz  hadoopinfra

15) Install OpenSSH. SSH is needed by hadoop to execute operations like start and stop the dfs namenode and datanode deamons in the cluster.

hadoop@roadrunner:~$ sudo apt-get install openssh-server

16) We need a key value pair used by hadoop for SSH communication (important: without a passphrase):

hadoop@roadrunner:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/
The key fingerprint is:
hadoop@roadrunner:~$ cat ~/.ssh/ >> ~/.ssh/authorized_keys
hadoop@roadrunner:~$ chmod 0600 ~/.ssh/authorized_keys

17) Start NameNode and DataNode daemon:

Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-hadoop-namenode-roadrunner.out
localhost: starting datanode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-hadoop-datanode-roadrunner.out
Starting secondary namenodes [] starting secondarynamenode, logging to /usr/local/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-roadrunner.out

18) Start YARN:


19) Check the namenode web application  is working: http://localhost:50070


20) Check YARN resourcemanager web application  is working: http://localhost:8088/



If you followed my walkthrough with success you now should have a Hadoop installation in pseudo-distributed mode running on your (local) machine. As the installation is just the beginning I now wish you a nice time exploring the Hadoop framework 😉

Comments are closed.