As I am planning to learn Hadoop, I wanted to install Hadoop (2.7.3) on my Ubuntu (16.04 LTS) and I followed the steps mentioned in the documentation on the Apache Hadoop website. I encountered few problems which are mentioned below, spent some time finding solution to them.
Below are the steps I followed and the description of the error is at the end of this post and also I have mentioned what I missed and what caused these errors.
Step 1. Download Hadoop installation file. For Hadoop 2.7.3 version I used the following link
http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.3/
And download a file named "hadoop-2.7.3-src.tar.gz"
Update: The link is no more valid as it has been archived, you can find hadoop-2.7.3-src.tar.gz file from : https://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/
$ tar zxf hadoop-2.7.3-src.tar.gz
This will extract the files into a folder "hadoop-2.7.3" .
Step 3. I moved the file to /home/<username> folder (many suggest to move it into /usr/local but I prefer to keep it here, may be once I learn more about linux I might get into that, as of now I am fine with my current setup.)
mv hadoop-2.7.3 /home/kiran/Step 4. Install the ssh and rsync
$ sudo apt-get install sshStep 5. Edit hadoop-env.sh file located at
$ sudo apt-get install rsync
/home/<username>/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
Find this line : export JAVA_HOME=${JAVA_HOME}replace ${JAVA_HOME} with the location of java to find where the java is located execute the command "whereis java" and you will get the path. For me it was located at the below path
/usr/bin/javaI replaced the ${JAVA_HOME} with "/usr/"
Step 6. Prepare to start the Hadoop cluster
Assuming you are within hadoop-2.7.3 directory type the following command
$bin/hadoopThis will display the usage documentation for the hadoop script. This means you are on the right path :)
Step 7. Configuration
a. Edit the file hadoop-2.7.3/etc/hadoop/core-site.xml , between <configuration></configuration> paste the following and save
<property>b. Edit the file hadoop-2.7.3/etc/hadoop/hdfs-site.xml, between <configuration></configuration> paste the following and save
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>c. Edit the file hadoop-2.7.3/etc/hadoop/mapred-site.xm, between <configuration></configuration> paste the following and save
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
d. Edit the file hadoop-2.7.3/etc/hadoop/yarn-site.xml, between <configuration></configuration> paste the following and save
<property>Step 8. Setup SSH
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Check if you can ssh to localhost without passphrase by executing the following command
$ ssh localhostIf you cannot then execute the following
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Step 9. Execution and testing the setup
a. Format the HDFS, assuming you are in hadoop-2.7.3 folder
$ bin/hdfs namenode -formatb. Start NameNode and DataNode deamons by the following command
$ sbin/start-dfs.shNow you can browse the NameNode by browsing to http://localhost:50070/
c. Create the folders required to run the MapReduce jobs by following commands
$ bin/hdfs dfs -mkdir /userd. You can stop the deamon by the following command
$ bin/hdfs dfs -mkdir /user/<username>
$ sbin/stop-dfs.she. You can start ResourceManager and NodeManager deamon by the following command
$ sbin/start-yarn.shNow you can browse the ResourceManager by browsing to http://localhost:8088/
f.You can stop the deamon by the following command
$ sbin/stop-yarn.sh
Background story
I missed the configuration step and encountered the following errors
475 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
If you encounter this error check if your configurations are correct or not.
You can also refer the Apache Hadoop Documentation for installation, executing the test jobs and further explanation : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html