Student Reviews
( 5 Of 5 )
1 review
Video of 13.1. Oozie Introduction in Apache Hadoop course by CloudxLab Official channel, video No. 69 free certified online
Oozie is a Java Web application used to schedule Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work and executes them. It Is integrated with the rest of the Hadoop stack. It can execute Hadoop jobs out of the box such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp as well as system specific jobs such as Java programs and shell scripts.
There are two types of Oozie jobs:
- Workflow jobs
- Coordinator jobs
Oozie Workflow jobs are Directed Acyclical Graphs - DAGs, specifying a sequence of actions to execute. DAG is a finite directed graph with no cycles. As shown in the image task 10 can only be executed after task 11 and 3 are executed.
Examples - Almost all task execution systems use DAG. Most Source Control Management Systems implement the revisions as a DAG
Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.
Example - Let’s say if we want to take data from HDFS and put to Hive every one hour, we can define a workflow in Oozie to take data from HDFS and put to Hive and run it as a coordinator job.
[Oozie - Use Case]
Let's understand a use case where we will be using Oozie.
Let's say we want to push web server access logs to HDFS and run Spark MLlib recommendation algorithm every day to generate recommendations and display it to the user. To do this, typical steps will be
- Flume agents will be running on web servers and will push access logs to HDFS
- Pig Script will be taking access log from HDFS, clean it and again push the cleaned data to HDFS
- Spark will take cleaned data from HDFS, run the MLlib recommendation algorithm on cleaned data and will push the recommendations to HDFS
- Sqoop will take recommendations from HDFS and will push it to MySQL
- The web server will take recommendations from MySQL and will display it to the end user.
In this use case, Steps 2-7 can be run as Oozie Coordinator jobs daily.
[Oozie - Workflow - XML]
We define Oozie workflow in XML files. Let’s see a sample XML for MapReduce operation. We define map class, reduce class, input and output directories in XML. We can configure similar workflows for other actions like Hive, Pig etc
This Big Data Tutorial will help you learn HDFS, ZooKeeper, Hive, HBase, NoSQL, Oozie, Flume, Sqoop, Spark, Spark RDD, Spark Streaming, Kafka, SparkR, SparkSQL, MLlib, and GraphX from scratch. Everything in this course is explained with the relevant example thus you will actually know how to implement the topics that you will learn in this course.
Let us know in the comments below if you find it helpful.
In order to claim the certificate from E&ICT Academy, IIT Roorkee, visit https://bit.ly/cxlyoutube
________
Website https://www.cloudxlab.com
Facebook https://www.facebook.com/cloudxlab
Instagram https://www.instagram.com/cloudxlab
Twitter http://www.twitter.com/cloudxlab