Requiescence: Hadoop and Ubuntu

I am restarting my blog after a long time, but not with some articles. This time I would like to share my knowledge on Hadoop and thus keeping a backup of my work.

Starting with the installation of CentOs/Ubuntu and Hadoop and setting up a workspace environment.

Download Ubuntu from:- http://www.ubuntu.com/download/desktop

If you chose to use CentOs:- http://www.centos.org/modules/tinycontent/index.php?id=15

You can download the VM Player from here:- https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/4_0

Once you download all these, follow the below steps.

1) Extract VM Player and assign the RAM memory(2GB recommended) and give your Ubuntu or CentOS download path to get started.
2) Power On your VMPlayer and the from the options VirtualMachine--> Install VMWare Tools.
3) To Enable SSH. Open terminal:- sudo apt-get install openssh-server
4) Download Hadoop CDH3 from this link in your Ubuntu/CentOS- I chose to download the below tarballs to get only the explicit tarballs whichever is needed. I downloaded Hadoop, Hive, Pig, Hbase.
https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
5) Create a directory structure in your Ubuntu:- I use /opt/isv/downloads and put all the downloaded tarballs here.
6) Extract the tarball hadoop-0.20.2-cdh3u4.tar to /opt/isv/hadoop.
i) /opt/isv/downloads> tar -xvf hadoop-0.20.2-chd3u4.tar
ii) Move the extracted directory to /opt/isv:- /opt/isv/downloads> mv hadoop-0.20.2-cdh3u4 /opt/isv
iii) Extract Pig, Hive and Hbase similary and move it to /opt/isv.
iv) sudo chown -R <userid>:<groupId> <HADOOP_HOME>
7) Download Java for CentOs:- Enter this on your terminal yum install java-1.7.0-openjdk
8) Download Java for Ubuntu:- sudo apt-get install openjdk-6-jdk.
9) Now update the below files from the directory /opt/isv/hadoop-0.20.2-chd3u4/conf
1) core-site.xml:-
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/hadoop/data</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2) mapred-site.xml:-
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
3) hdfs-site.xml:-
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
4) hadoop-env.sh
export JAVA_HOME=/Library/Java/Home;

4) Edit bashrc and enter the following:- vi ~/.bashrc.

export JAVA_HOME=/Library/Java/Home
export HADOOP_HOME=/opt/isv/hadoop-0.20.2-cdh3u4
export PIG_INSTALL=/opt/isv/pig-0.9.1
export HIVE_INSTALL=/opt/isv/hive-0.7.1
export HBASE_HOME=/opt/isv/hbase-0.90.4
export ZOOKEEPER_INSTALL=/opt/isv/zookeeper-3.3.3
export MAVEN_INSTALL=/opt/isv/apache-maven-3.0.3
export SQOOP_HOME=/opt/isv/sqoop-1.3.0
export MYSQL_HOME=/opt/isv/mysql-5.5.17-osx10.6-x86
export CASSANDRA_HOME=/opt/isv/apache-cassandra-1.1.0

export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PIG_INSTALL/bin
export PATH=$PATH:$HIVE_INSTALL/bin:$HBASE_HOME/bin
export PATH=$PATH:$ZOOKEEPER_INSTALL/bin:$MYSQL_HOME/bin
export PATH=$PATH:$MAVEN_INSTALL/bin:$SQOOP_HOME/bin
export PATH=$PATH:$CASSANDRA_HOME/bin

echo $PATH

5) From the terminal:- start-all.sh
6) hadoop namenode -format.
7) Have fun programming hadoop.
8) Just in case, if it asks for passwords, add your user to /etc/sudoers.
<username> ALL=(ALL) NOPASSWD=ALL
9) To Generate Public Keys:- ssh -t keygen -t rsa -P ""
10) To Update the Keys:- cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Requiescence

Sunday, July 22, 2012

Hadoop and Ubuntu

No comments:

Post a Comment