5 Executing Oozie Workflows
This chapter includes the following sections:
5.1 Executing Oozie Workflows with Oracle Data Integrator
To execute oozie workflows with oracle data integrator, setup the Oozie runtime engine, execute or deploy an Oozie workflow and then audit the Hadoop Logs.
The following table summarizes the steps you need to perform to execute Oozie Workflows with Oracle Data Integrator.
Table 5-1 Executing Oozie Workflows
Step | Description |
---|---|
Set up the Oozie runtime engine |
Set up the Oozie runtime engine to configure the connection to the Hadoop data server where the Oozie engine is installed. This Oozie runtime engine is used to execute ODI Design Objects or Scenarios on the Oozie engine as Oozie workflows. |
Execute or deploy an Oozie workflow |
Run the ODI Design Objects or Scenarios using the Oozie runtime engine created in the previous step to execute or deploy an Oozie workflow. |
Audit Hadoop Logs |
Audit the Hadoop Logs to monitor the execution of the Oozie workflows from within Oracle Data Integrator. See Auditing Hadoop Logs. |
5.2 Setting Up and Initializing the Oozie Runtime Engine
Before you set up the Oozie runtime engine, ensure that the Hadoop data server where the Oozie engine is deployed is available in the topology. The Oozie engine must be associated with this Hadoop data server.
To set up the Oozie runtime engine:
5.2.1 Oozie Runtime Engine Definition
The following table describes the fields that you need to specify on the Definition tab when defining a new Oozie runtime engine. An Oozie runtime engine models an actual Oozie server in a Hadoop environment.
Table 5-2 Oozie Runtime Engine Definition
Field | Values |
---|---|
Name |
Name of the Oozie runtime engine that appears in Oracle Data Integrator. |
Host |
Name or IP address of the machine on which the Oozie runtime agent has been launched. |
Port |
Listening port used by the Oozie runtime engine. Default Oozie port value is 11000. |
Web application context |
Name of the web application context. Type |
Protocol |
Protocol used for the connection. Possible values are |
Hadoop Server |
Name of the Hadoop server where the oozie engine is installed. This Hadoop server is associated with the oozie runtime engine. |
Poll Frequency |
Frequency at which the Hadoop audit logs are retrieved and stored in ODI repository as session logs. The poll frequency can be specified in seconds (s), minutes (m), hours (h), days (d), and years (d). For example, 5m or 4h. |
Lifespan |
Time period for which the Hadoop audit logs retrieval coordinator stays enabled to schedule audit logs retrieval workflows. Lifespan can be specified in minutes (m), hours (h), days (d), and years (d). For example, 4h or 2d. |
Schedule Frequency |
Frequency at which the Hadoop audit logs retrieval workflow is scheduled as an Oozie Coordinator Job. Schedule workflow can be specified in minutes (m), hours (h), days (d), and years (d). For example, 20m or 5h. |
5.2.2 Oozie Runtime Engine Properties
The following table describes the properties that you can configure on the Properties tab when defining a new Oozie runtime engine.
Table 5-3 Oozie Runtime Engine Properties
Field | Values |
---|---|
OOZIE_WF_GEN_MAX_DETAIL |
Limits the maximum detail (session level or fine-grained task level) allowed when generating ODI Oozie workflows for an Oozie engine. Set the value of this property to TASK to generate an Oozie action for every ODI task or to SESSION to generate an Oozie action for the entire session. |
5.3 Creating a Logical Oozie Engine
To create a logical oozie agent:
- In Topology Navigator, right-click the Agents node in the Logical Architecture navigation tree.
- Select New Logical Oozie Engine.
- Fill in the Name.
- For each Context in the left column, select an existing Physical Agent in the right column. This Physical Agent is automatically associated to the Logical Oozie Engine in this context.
- From the File menu, click Save.
5.4 Executing or Deploying an Oozie Workflow
You can run an ODI design-time object such as a Mapping or a runtime object such as a Scenario using an Oozie Workflow. When running the ODI design object or scenario, you can choose to only deploy the Oozie workflow without executing it.
Note:
To enable SQOOP logging when executing an Oozie workflow, add the below property to the data server –HADOOP_CLIENT_OPTS="-Dlog4j.debug -Dhadoop.root.logger=INFO,console -Dlog4j.configuration=file:/etc/hadoop/conf.cloudera.yarn/log4j.properties"
To execute an ODI Oozie workflow:
-
From the Projects menu of the Designer navigator, right-click the mapping that you want to execute as an Oozie workflow and click Run.
-
From the Logical Agent drop-down list, select the Oozie runtime engine.
-
Click OK.
The Information dialog appears.
-
Check if the session started and click OK on the Information dialog.
To deploy an ODI Oozie workflow:
-
From the Load Plans and Scenarios menu of the Designer navigator, right-click the scenario that you want to deploy as an Oozie workflow and click Run.
-
From the Logical Agent drop-down list, select the Oozie runtime engine.
-
Select Deploy Only to process the scenario, generate the Oozie workflow, and deploy it to HDFS.
-
Click OK.
The Information dialog appears.
-
Check if the session started and click OK on the Information dialog.
5.5 Auditing Hadoop Logs
When the ODI Oozie workflows are executed, log information is retrieved and captured according to the frequency properties on the Oozie runtime engine. This information relates to the state, progress, and performance of the Oozie job.
You can retrieve the log data of an active Oozie session by clicking the Retrieve Log Data in the Operator menu. Also, you can view information regarding the oozie session in the oozie webconsole or the MapReduce webconsole by clicking the URL available in the Definition tab of the Session Editor.
The Details tab in the Session Editor, Session Step Editor, and Session Task Editor provides a summary of the oozie and MapReduce job.
5.6 Userlib jars support for running ODI Oozie workflows
Support of userlib jars for ODI Oozie workflows allows a user to copy jar files into a userlib HDFS directory, which is referenced by ODI Oozie workflows that are generated and submitted with the oozie.libpath
property.
This avoids replicating the libs/jars
in each of the workflow app's lib HDFS directory. The userlib directory is located in HDFS in the following location:
<ODI HDFS Root>/odi_<version>/userlib