Screen-2BShot-2B2015-07-13-2Bat-2B10.37.32-2BPM

How to install Hadoop 2.7.0 on MacOS with Yarn

This post will describe the Hadoop 2.7.0 installation on MacOS in simple steps. Let’s see how to do that:
1) First download the Hadoop-2.7.0.tar.gz. You can download it from here . Or you can go to the downloads page of Hadoop:
2) Untar the file by using command:

tar -xvf hadoop-2.7.0.tar.gz

It will extract the contents in the folder hadoop-2.7.0

3) If you list out the folder contents it will have:
drwxr-xr-x@ 13 chikki staff 442 Jul 7 21:59 .

drwxr-xr-x 5 chikki staff 170 Jul 9 21:53 ..
drwxr-xr-x@ 13 chikki staff 442 Apr 11 00:21 bin
drwxr-xr-x@ 3 chikki staff 102 Apr 11 00:21 etc
drwxr-xr-x@ 7 chikki staff 238 Apr 11 00:21 include
drwxr-xr-x@ 3 chikki staff 102 Apr 11 00:21 lib
drwxr-xr-x@ 12 chikki staff 408 Apr 11 00:21 libexec
-rw-r--r--@ 1 chikki staff 15429 Apr 11 00:21 LICENSE.txt
drwxr-xr-x 28 chikki staff 952 Jul 10 22:33 logs
-rw-r--r--@ 1 chikki staff 101 Apr 11 00:21 NOTICE.txt
-rw-r--r--@ 1 chikki staff 1366 Apr 11 00:21 README.txt
drwxr-xr-x@ 30 chikki staff 1020 Apr 11 00:21 sbin
drwxr-xr-x@ 4 chikki staff 136 Apr 11 00:21 share

4) Now, generate the ssh key by using command:

>ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/chikki/.ssh/id_rsa): test
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in test.
Your public key has been saved in test.pub.
The key fingerprint is:
6b:3c:ee:41:99:25:c8:aa:4f:c5:ab:c7:1b:50:54:ef chikki@Chikkis-MacBook-Pro.local
The key's randomart image is:
+--[ RSA 2048]----+
| ... |
| o . . |
| + . o |
| + * |
| o o S E |
| . o + . |
| . ..o * |
| o .o+ o |
| o..oo |
+-----------------+

5) Move this key to the ~/.ssh folder by using cat command


6) Edit hadoop-env.sh file located at $HADOOP_DIR/etc/hadoop/ location and update the JAVA_HOME path to the directory where jdk is installed in the machinee. I have used it as:

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk/Contents/Home/

7) Edit core-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

8) Create a folder for namenode and datanode by using command:

mkdir -P $HOME/hadoop2_data/hdfs/namenode

mkdir -P $HOME/hadoop2_data/hdfs/datanode

Location can by anything where the hadoop user is having access.

9) Edit hdfs-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name> <value>/Users/hadoopuser/Documents/hadoop/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name> <value>/Users/hadoopuser/Documents/hadoop/hadoop2_data/hdfs/datanode</value>
</property>
</configuration>

10) Edit mapred-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

11) Edit yarn.xml file located at $HADOOP_DIR/etc/hadoop/ location:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-service.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

12) All the setup has done. Now run the below command to format the namenode directory:

$HADOOP_HOME/bin/hdfs namenode -format

13) Now use below command to start the namenode

$HADOOP_HOME/sbin/start-dfs.sh

This is all for the Hadoop setup. The namenode can be accessed by http://localhost:50070

In the next blog we will see how to put files in hdfs directory.

2 thoughts on “How to install Hadoop 2.7.0 on MacOS with Yarn

  1. The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Leave a Reply

Your email address will not be published. Required fields are marked *