Skip to main content

How to Install Hadoop (2.7.3) on Ubuntu (16.04 LTS)



As I am planning to learn Hadoop, I wanted to install Hadoop (2.7.3) on my Ubuntu (16.04 LTS) and I followed the steps mentioned in the documentation on the Apache Hadoop website. I encountered few problems which are mentioned below, spent some time finding solution to them.

Below are the steps I followed and the description of the error is at the end of this post and also I have mentioned what I missed and what caused these errors.


Step 1. Download Hadoop installation file. For Hadoop 2.7.3 version I used the following link 

http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
Step 2. (Assuming you have downloaded the file into /home/<username>/Downloads folder

$ tar zxf hadoop-2.7.3-src.tar.gz

This will extract the files into a folder  "hadoop-2.7.3" .

Step 3. I moved the file to /home/<username> folder (many suggest to move it into /usr/local but I prefer to keep it here, may be once I learn more about linux I might get into that, as of now I am fine with my current setup.)

mv hadoop-2.7.3 /home/kiran/
Step 4. Install the ssh and rsync
 $ sudo apt-get install ssh
  $ sudo apt-get install rsync
Step 5. Edit hadoop-env.sh file located at

            /home/<username>/hadoop-2.7.3/etc/hadoop/hadoop-env.sh

 Find this line : export JAVA_HOME=${JAVA_HOME}
replace ${JAVA_HOME} with the location of java to find where the java is located execute the command "whereis java" and you will get the path. For me it was located at the below path
       /usr/bin/java
 I replaced the ${JAVA_HOME} with "/usr/"

Step 6. Prepare to start the Hadoop cluster

Assuming you are within hadoop-2.7.3 directory type the following command
$bin/hadoop
This will display the usage documentation for the hadoop script. This means you are on the right path :)

Step 7. Configuration

a. Edit the file hadoop-2.7.3/etc/hadoop/core-site.xml , between <configuration></configuration> paste the following and save
 <property>        
    <name>fs.defaultFS</name>        
    <value>hdfs://localhost:9000</value>   
 </property>
b. Edit the file hadoop-2.7.3/etc/hadoop/hdfs-site.xml, between <configuration></configuration> paste the following and save

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
c. Edit the file hadoop-2.7.3/etc/hadoop/mapred-site.xm, between <configuration></configuration> paste the following and save
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

d. Edit the file  hadoop-2.7.3/etc/hadoop/yarn-site.xml, between <configuration></configuration> paste the following and save
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
Step 8. Setup SSH

  Check if you can ssh to localhost without passphrase by executing the following command
$ ssh localhost
If you cannot then execute the following
  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys


Step 9. Execution and testing the setup

a. Format the HDFS, assuming you are in hadoop-2.7.3 folder
 $ bin/hdfs namenode -format
b. Start NameNode and DataNode deamons by the following command
 $ sbin/start-dfs.sh
Now you can browse the NameNode by browsing to http://localhost:50070/

c. Create the folders required to run the MapReduce jobs by following commands
$ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>
d. You can stop the deamon by the following command
 $ sbin/stop-dfs.sh
e. You can start ResourceManager and NodeManager deamon by the following command
$ sbin/start-yarn.sh
Now you can browse the ResourceManager by browsing to http://localhost:8088/

f.You can stop the deamon by the following command
 $ sbin/stop-yarn.sh


Background story
I missed the configuration step and encountered the following errors

475 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.

If you encounter this error check if your configurations are correct or not.

You can also refer the Apache Hadoop Documentation for installation, executing the test jobs and further explanation : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

Comments

Joel said…
Explained it clearly for beginners. Great Job!!! :)
Rahul said…
Thx. Very concise and clear.
JonathanL said…
This comment has been removed by the author.

Popular posts from this blog

Javascript KeyCode Reference table for Event Handling

The post explains Keyboard event handling using javascript.Javascript events are used to capture user keystrokes. Below is a table of key codes for the keys on a multimedia keyboard. If this table is inconsistent with your own findings, please let me know.

Java Script Code to Find Key code

<script language="JavaScript">
document.onkeydown = checkKeycode
function checkKeycode(e) {
var keycode;
if (window.event) keycode = window.event.keyCode;
else if (e) keycode = e.which;
alert("keycode: " + keycode);
}
</script>


Key Code Reference Table
Key PressedJavascript Key Codebackspace8tab9enter13shift16ctrl17alt18pause/break19caps lock20escape27page up33page down34end35home36left arrow37up arrow38right arrow39down arrow40insert45delete46048149250351452553654755856957a65b66c67d68

Replacing OpenJDK with Oracle JDK in Ubuntu

You can completely remove the OpenJDK and fresh Install Oracle Java JDK by the following steps:
Remove OpenJDK completely by this command:
sudo apt-get purge openjdk-\*Download the Oracle Java JDKhere.
Note: download appropriate file, for example if your system is x64 Ubuntu (i.e, Debian) the download file is named like this: jdk-8u51-linux-x64.tar.gz
To find which version is your OS, check hereCreate a folder named java in /usr/local/by this command:
sudo mkdir -p /usr/local/javaCopy the Downloaded file in the directory /usr/local/java. To do this, cd into directory where downloaded file is located and use this command for copying that file to /usr/local/java/:
sudo cp -r jdk-8u51-linux-x64.tar.gz /usr/local/java/CD into /usr/local/java/ directory and extract that copied file by using this command:
sudo tar xvzf jdk-8u51-linux-x64.tar.gzAfter extraction you must see a folder named jdk1.8.0_51.Update PATH file by opening /etc/profile file by the command sudo nano /etc/profile and past…