Sunday, December 9, 2012

Oozie Installation

Oozie Installation

 I tried to install oozie and run a workflow from the samples. I thought this was a little cumbersome effort, thought I would post it on my blog which might help someone.

Pre-Requisties:-  I use Mac OS X Lion, and here is what I have at hand before.
 a) Hadoop Installed at the location, /Users/hadoop/hadoop-0.20.2, referred as HADOOP_HOME
 b) Java Installed with version 1.6.0_33
 c) Maven 3.0* version installed under /Users/hadoop/apache-maven-3.0.3
 c) For readability, when I mention Hadoop config, it is under, $HADOOP_HOME/conf.

1) Download Oozie from the site:-  Oozie Download

2) Extracted the below tar  to /Users/hadoop/, referred as OOZIE_HOME

3) Open the terminal, go to the directory, $OOZIE_Home/bin

4) Run the script, mkDistro.sh -DskipTests:- This step will download all the jars required for Oozie and creates the incubating packages under the $OOZIE_HOME/distro/target.

5) Download the zip extJs-2.2.zip, which is the app for Oozie to open up a console for oozie and look at the jobs that can are scheduled, run etc. You can download this from the link:- ExtJs2.2.zip

6) Create the oozie.war, by running the following command:-  ./$OOZIE_HOME/distro/target/oozie-3.2.0-incubating-distro/oozie-3.2.0-incubating/bin/oozie-setup.sh -hadoop 0.20.2 / -extjs /Users/hadoop/ext-2.2.zip

7) The above steps makes the oozie.war from the incubating project, along with the hadoop required jars, and also makes the ext-js to the oozie-server.

8) Now copy the newly created oozie.war to webapps where tomcat runs.
cp ./$OOZIE_HOME/distro/target/oozie-3.2.0-incubating-distro/oozie-3.2.0-incubating/oozie-server/webapps/oozie.war $OOZIE_HOME/webapp/src/main/webapp/oozie.war

9) Change the configuration, $OOZIE_HOME/distro/target/oozie-3.2.0-incubating-distro/oozie-3.2.0-incubating/conf/oozie-site.xml. 

     <property>
        <name>oozie.service.JPAService.create.db.schema</name>
        <value>true</value>
        <description>
            Creates Oozie DB.
            If set to true, it creates the DB schema if it does not exist. If the DB schema exists is a NOP.
            If set to false, it does not create the DB schema. If the DB schema does not exist it fails start up.
        </description>
    </property>

10) Edit the Hadoop config core-site.xml, and add the below lines, which will allow oozie to connect to Hadoop.
<property>
     <name>hadoop.proxyuser.{$user.name}.hosts</name>                                               
      <value>*</value>
 </property>
       
 <property>
      <name>hadoop.proxyuser.{$user.name}.groups</name>
      <value>*</value>
 </property>

11) Please do run hadoop as well now.
12) Start the Oozie Server by running the following command:- ./$OOZIE_HOME/distro/target/oozie-3.2.0-incubating-distro/oozie-3.2.0-incubating/bin/oozie-start.sh. You can check if it is running by entering this on the browser:- http://localhost:11000/oozie.

Now, if everything runs smooth, lets see if we can run an example.

Steps to run an example on Oozie.
1)  Extract the examples.jar and place it in $OOZIE_HOME/examples/target/.
2) Inside the above directory, you will see the different apps, where you can modify the job.properties in each application,which you want to run, and put it on hadoop.
 2.1) For. E.g.:- Open up, job.properties, in /$OOZIE_HOME/examples/target/examples/apps/map-reduce/job.properties. 
    2.1.1) nameNode=hdfs://localhost:8020 (This information can be found in hadoop config, core-site.xml)

jobTracker=localhost:8021 (This information can be found in hadoop config, mapred-site.xml)
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/hadoop/${examplesRoot}/apps/map-reduce
outputDir=map-reduce
2.2) hadoop fs -put $OOZIE_HOME/examples/target/examples /users/hadoop/examples

3) Run the examples:- ./$OOZIE_HOME/bin/oozie -job oozie -config /$OOZIE_HOME/examples/target/examples/apps/map-reduce/job.properties -run

4) If your job is successfully submitted, you can see it running on oozie console

This is where my troubles started. 
1) 500 Internal Server Error:- When I checked in the oozie.log, I see that it requires all the services that are defined in oozie-default.xml to oozie-site.xml as below.

<property>
        <name>oozie.services</name>
        <value>
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.CallableQueueService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.HadoopAccessorService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.DagXLogInfoService,
            org.apache.oozie.service.SchemaService,
            org.apache.oozie.service.LiteWorkflowAppService,
            org.apache.oozie.service.JPAService,
            org.apache.oozie.service.StoreService,
            org.apache.oozie.service.CoordinatorStoreService,
            org.apache.oozie.service.SLAStoreService,
            org.apache.oozie.service.DBLiteWorkflowStoreService,
            org.apache.oozie.service.CallbackService,
            org.apache.oozie.service.ActionService,
            org.apache.oozie.service.ActionCheckerService,
            org.apache.oozie.service.RecoveryService,
            org.apache.oozie.service.PurgeService,
            org.apache.oozie.service.CoordinatorEngineService,
            org.apache.oozie.service.BundleEngineService,
            org.apache.oozie.service.DagEngineService,
            org.apache.oozie.service.CoordMaterializeTriggerService,
            org.apache.oozie.service.StatusTransitService,
            org.apache.oozie.service.PauseTransitService,
            org.apache.oozie.service.GroupsService,
            org.apache.oozie.service.ProxyUserService
        </value>
        <description>
            All services to be created and managed by Oozie Services singleton.
            Class names must be separated by commas.
        </description>
    </property>

2) This solved my problem to some extent. Then, I saw some NPE errors, which made me realize that the oozie.war has not been built properly, when I ran oozie-setup.sh. This is specifically while copying the hadoop lib to oozie lib. I had to copy the jar, guava-r09-jarjar.jar, and re-create the oozie.war. 
    2.1) Copy the jar from hadoop to $OOZIE_HOME/webapp/src/main/webapp/WEB-INF/lib.
    2.2) jar cf oozie.war *
    2.3) Restart Hadoop and Oozie and run the examples with the command:-
            ./$OOZIE_HOME/bin/oozie -job oozie -config /$OOZIE_HOME/examples/target/examples/apps/map-reduce/job.properties -run

If the job is submitted, you can see the console with the job submitted and its progress.



Have fun OOZEIIIINNNNGGGG!!!!!


Sunday, July 22, 2012

Hadoop and Ubuntu

I am restarting my blog after a long time, but not with some articles. This time I would like to share my knowledge on Hadoop and thus keeping a backup of my work.

Starting with the installation of CentOs/Ubuntu and Hadoop and setting up a workspace environment.

Download Ubuntu from:- http://www.ubuntu.com/download/desktop

If you chose to use CentOs:- http://www.centos.org/modules/tinycontent/index.php?id=15

You can download the VM Player from here:- https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/4_0

Once you download all these, follow the below steps.

1) Extract VM Player and assign the RAM memory(2GB recommended) and give your Ubuntu or CentOS download path to get started.
2) Power On your VMPlayer and the from the options VirtualMachine--> Install VMWare Tools.
3) To Enable SSH. Open terminal:- sudo apt-get install openssh-server
4) Download Hadoop CDH3 from this link in your Ubuntu/CentOS- I chose to download the below tarballs to get only the explicit tarballs whichever is needed. I downloaded Hadoop, Hive, Pig, Hbase.
https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs
5) Create a directory structure in your Ubuntu:- I use /opt/isv/downloads and put all the downloaded tarballs here.
6) Extract the tarball hadoop-0.20.2-cdh3u4.tar to /opt/isv/hadoop.
      i) /opt/isv/downloads> tar -xvf hadoop-0.20.2-chd3u4.tar
     ii) Move the extracted directory to /opt/isv:- /opt/isv/downloads> mv hadoop-0.20.2-cdh3u4 /opt/isv
    iii) Extract Pig, Hive and Hbase similary and move it to /opt/isv.
     iv) sudo chown -R <userid>:<groupId> <HADOOP_HOME>
7) Download Java for CentOs:- Enter this on your terminal yum install java-1.7.0-openjdk
8) Download Java for Ubuntu:- sudo apt-get install openjdk-6-jdk.
9) Now update the below files from the directory /opt/isv/hadoop-0.20.2-chd3u4/conf
      1) core-site.xml:-
 <configuration>
      <property>
           <name>hadoop.tmp.dir</name>
           <value>/Users/hadoop/data</value>
      </property>
      <property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000</value>
     </property>
</configuration>
   2) mapred-site.xml:-
   <configuration>
        <property>
             <name>mapred.job.tracker</name>
             <value>localhost:9001</value>
        </property>
  </configuration>
3) hdfs-site.xml:-
     <configuration>
          <property>
              <name>dfs.replication</name>
              <value>1</value>
        </property>
     </configuration>
4) hadoop-env.sh
export JAVA_HOME=/Library/Java/Home;

4) Edit bashrc and enter the following:- vi ~/.bashrc.


export JAVA_HOME=/Library/Java/Home
export HADOOP_HOME=/opt/isv/hadoop-0.20.2-cdh3u4
export PIG_INSTALL=/opt/isv/pig-0.9.1
export HIVE_INSTALL=/opt/isv/hive-0.7.1
export HBASE_HOME=/opt/isv/hbase-0.90.4
export ZOOKEEPER_INSTALL=/opt/isv/zookeeper-3.3.3
export MAVEN_INSTALL=/opt/isv/apache-maven-3.0.3
export SQOOP_HOME=/opt/isv/sqoop-1.3.0
export MYSQL_HOME=/opt/isv/mysql-5.5.17-osx10.6-x86
export CASSANDRA_HOME=/opt/isv/apache-cassandra-1.1.0

export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PIG_INSTALL/bin
export PATH=$PATH:$HIVE_INSTALL/bin:$HBASE_HOME/bin
export PATH=$PATH:$ZOOKEEPER_INSTALL/bin:$MYSQL_HOME/bin
export PATH=$PATH:$MAVEN_INSTALL/bin:$SQOOP_HOME/bin
export PATH=$PATH:$CASSANDRA_HOME/bin

echo $PATH

5) From the terminal:- start-all.sh
6) hadoop namenode -format.
7) Have fun programming hadoop.
8) Just in case, if it asks for passwords, add your user to /etc/sudoers.
<username>      ALL=(ALL)       NOPASSWD=ALL
9) To Generate Public Keys:- ssh -t keygen -t rsa -P ""
10) To Update the Keys:- cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys