3 Overview of the Components of a Pipeline
A Oracle Stream Analytics pipeline is comprised of many components that define the pipeline.
These components are described in the following topics:
Understanding Query Stage
A query stage is used to configure a SQL-like query on the data stream and comprises additional sources for joins, filters, summaries, group by, time windows, and so on.
For example, the query below calculates hourly total sales where transaction amount is greater than a dollar and outputs the result every 1 second.
Select sum (TransactionAmount) As HourlySales
From SalesStream [Range 1 Hour Slide 1 Second]
Where TransactionAmount > 1
Queries like above or more complex queries can all be configured in the query stage with zero coding and with no intimate knowledge of Continuous Query Language or CQL. The CQL language is similar to SQL but with additional constructs for temporal analytics and pattern matching.
A query stage has the following sub sections:
-
Filter
-
Correlation
-
Summary/Group By
-
Range
-
Evaluation Frequency
What is Filter?
The filter section in a query stage or query group stage allows events in the data stream to be filtered out.
Only events which satisfy the filter condition are passed to the downstream stage. For example, in a data stream containing SensorId
and Temperature
, you can filter events where Temperature
is lower than or equal to 70 degrees by setting the filter condition to Temperature > 70
.
What is Correlation?
A correlation is used to enrich the incoming event in the data stream with static data in a database table or with data from other streams.
For example, if the event in the data stream only includes SensorId
and Sensor Temperature
, the event could be enriched with data from a table to obtain SensorMake
, SensorLocation
, SensorThreshold
, and many more.
Correlating an event with other sources requires the join condition to be based on a common key. In the above example, the SensorId
from the stream cand be used to correlate with SensorKey
in the database table. The following query illustrates the above data enrichment scenario producing sensor details for all sensors whose temperature exceeds their pre-defined threshold.
Select T.SensorId, T.Temperature, D.SensorName, D.SensorLocation
From TemperatureStream[Now] T, SensorDetailsTable D
Where T.SensorId = D.SensorKey And T.Temperature > D.SensorThreshold
Queries like above and more complex queries can be automatically generated by configuring sources and filter sections of the query stage.
What is Summary?
A data stream is a continuous sequence of events but we can summarize the data over any time range including an unbounded range.
For example, you can continuously compute the maximum temperature for each sensor from the beginning of time by configuring a query like the one below in a Query stage.
Select SesnsorId, max(Temperature)
From TemperatureStream
Group By SensorId
What is Group By?
A group by collects the data of all the rows with an identical column value. Group by is used in conjunction with Summaries (aggregate functions) to provide information about each group.
Here is an example configuration that generates a query for computing the average temperature of each sensor at the end of the hour and using readings from last one hour.
Select SesnsorId, avg(Temperature)
From TemperatureStream [Range 1 Hour Slide 1 Hour]
Group By SensorId
Example
If you add multiple group bys, the data is grouped on multiple columns. For example, you have a stream that gives you sales numbers for geographical locations. You have the following columns BEFORE group by:
COUNTRY CITY REVENUE
US SF 500
US NY 1000
INDIA BOMBAY 800
INDIA BOMBAY 1500
INDIA BOMBAY 700
.........
Calculate sum of revenue (summary) by country (groupby) to get:
COUNTRY SUM_OF_REVENUE
US 1500
INDIA 3000
Add CITY as another group by, to get your aggregations grouped by city in addition to country:
COUNTRY CITY SUM_OF_REVENUE
US NY 1000
US SF 500
INDIA BOMBAY 1500
INDIA BANGALORE 1500
What is Range?
A range is a window applied on the data stream. Since data stream is an unbounded sequence of events it is often necessary to apply a window when computing aggregates.
Examples of ranges include – Last 1 Hour
of events, Last 5 Minutes
of events, Last 10 Events
, and many more. Applying a range retains data in memory so be cautious with use of window ranges. For example, if data is arriving at the rate of 2000 events per second and if each event is 1KB then we have 2MB of data per second. Applying a 1-hour window on this data stream consumes 2MB times 3600 or 7.2GB of memory.
The supported time units in a range are:
-
now
-
nanoseconds
-
microseconds
-
milliseconds
-
seconds
-
minutes
-
hours
-
events
What is Evaluation Frequency?
Evaluation Frequency or a Window Slide (commonly referred to) determines the frequency at which results are desired.
For example, the configured query below outputs total sales every 1 second but using transactions from last 1 hour.
Select sum (TransactionAmount) As HourlySales
From SalesStream [Range 1 Hour Slide 1 Second]
In other words, Evaluation Frequency determines how often you want to see the results. In the above query, if result is only desired at the end of the hour then we set the Evaluation Frequency to 1 hour.
Understanding Rules
A rule is a set of conditions applied to the incoming stream and a set of actions performed on the stream when conditions are true. Each event is analyzed independently of other events.
For example, assume that your stream is a stream from pressure sensors and has the following fields:
-
sensor_id
-
pressure
-
status
If you want to assign a status value based on the pressure, you can define the following rules:
-
if the pressure is less than or equal to 50, the status must be set to GREEN
-
if the pressure is between 50 and 100, the status must be set to YELLOW
-
if the pressure is greater than 100, the status must be set to RED.
Understanding Rule Stage
A rule stage is a stage in the pipeline where you apply conditional (IF - THEN
) logic to the events in the stream. You can check for specific conditions and assign values to fields based on the results of your checks.
You can add multiple rules to the stage and they will get applied to pipeline in the sequence they are added. A rule is a set of conditions applied to the incoming stream and a set of actions performed on the stream when conditions are true. Each event is analyzed independently of other events.
Understanding Pattern Stage
Patterns are a stage within a pipeline. When working from a pattern, you need to specify a few key fields to discover an interesting result. You can create pattern stages within the pipeline. Patterns are not stand-alone artifacts. They need to be embedded within a pipeline.
Understanding Custom Stage
Custom Stage is a type of stage where you can apply your custom stage type to your streaming data in your pipeline. It behaves like any other type of stage with data flowing into and out of it. It is close to a pattern stage in the way that you are asked to configure a few parameters before its logic applies to the stream.
Understanding Scoring Stage
Scoring Stage is a type of stage where you apply a machine learning model to your streaming data to do predictive analysis. Scoring Stage is the infrastructure that Oracle Stream Analytics provides to data scientists for machine learning model deployment against streaming data.
Understanding Query Group
A query group stage lets you do aggregations on multiple group bys and multiple windows. It is a collection of groups, where each of the group has its own window, filters that affect the data processing only within that group.
A query group has two types of stages:
-
Stream
-
Table
What is Query Group Stage: Stream?
A query group stage of the type stream is where you can apply aggregate functions with different group-bys and window ranges to your streaming data. You can have multiple query groups in one stage.
What is Query Group Stage: Table?
A query group stage of the type table is where you can apply aggregate functions with different group bys and window ranges to a database table data recreated in memory. Use this stage on a change data capture stream, such as GoldenGate. You can have multiple query groups in one stage.
Understanding the Live Output Table
The Live Output table is the main feedback mechanism from the pipelines that you build. The Live Output table will display events that go out of your pipeline, after your processing logic has been applied on the incoming stream or streams.
The Live Output table will be displayed for each stage of your pipeline and will include output of that particular stage. On the source stage the Live Output table will display events as they arrive in the stream. On the target stage, the Live Output stage will display events as they will flow to the target.
The Live Output table is also a powerful tool for event shape manipulation. With the Live Output table you can:
-
Add new fields to the event using an extensive library of functions in the expression builder, and remove new fields
-
Change the order of the event fields
-
Rename event fields
-
Remove existing fields from the output of the stage
-
Add a timestamp field to each event
-
Hide fields from view (but retain them in the output)
-
Pause and restart event display in the browser (not affecting downstream stages or targets)
The interaction with the table should be intuitively clear to anyone who has worked with popular spreadsheet pipelines.
Expression Builder
The expression builder provides functionality to add new fields to your output based on existing fields. You can use a rich library of functions to manipulate your event data. A simple example is string concatenation; you can construct a full name from first and last names:

Description of the illustration string_concat.png
Note:
The event shape manipulation functionality is available on the table in the query stage.The expression builder has syntax highlighting and code completion. You can also see the function signature, input parameters and the return value in the Expression Editor.

Description of the illustration concat_details.png
Understanding Visualizations
Visualization is mapping of the data (information) to a graphical or tabular format which can be used to answer a specific analytical question.
It translates data, its properties and relationships into an easy to interpretable visual object consisting of points, lines, shapes and colors. It effectively represents the results of the meaningful multi-dimensional questions. It also enables to discover the influential patterns out of the represented data (information) using the visual analysis.
Visualizations
Visualizations are divided into two categories:
-
Axis based
Axis based visualizations display series and groups of data. Series and groups are analogous to the rows and columns of a grid of data. Typically, the rows in the grid appear as a series in visualization, and the columns in the grid appear as groups.
Axis based visualizations enables users to visualize the data along two graph axis x and y like sum of sales over regions or sum of sales over time period. X axis values can be categorical in nature like regions or can be based on time series values whereas Y axis represents the measured value like sum(sales). These charts are useful for visualizing trends in a set of values over time and comparing these values across series.
-
Spatial
Spatial visualizations are used when geography is especially important in analyzing an event. It represents business data superimposed on a single geo fence.
What is a Bar Type of Visualization?
Bar visualization is one of the widely used visualization types which represents data as a series of vertical bars. It is best suited for comparison of the values represented along y axis where different categories are spread across x axis. In a Bar visualization vertical columns represent metrics (measured values). The horizontal axis displays multiple or non-consecutive categories.
In Horizontal Bar, the axis positions are switched. The vertical axis displays multiple or non-consecutive categories. The horizontal columns represents metrics (measured values). It is preferable when the category names are long text values and requires more space in order to be displayed.
What is a Line Type of Visualization?
Line visualization represents data as a line, as a series of data points, or as data points that are connected by a line. Line visualization require data for at least two points for each member in a group. The X-axis is a single consecutive dimension, such as a date-time field, and the data lines are likely to cross. X axis can also have non date-time categories. Y axis represents the metrics (measured value). It is preferred to use line visualization when data set is continuous in nature. It is best suited for trend-based plotting of data over a period of time. In Line visualization, emphasis is on the continuation or the flow of the values (a trend) but individual value comparison can also be carried out. Multiple series can also be compared with the line visualizations.
It can have a horizontal orientation where axis are switched i.e. y axis holds categories whereas x axis shows metrics.
What is An Area Type of Visualization?
Area visualization represents data as a filled-in area. Area visualization requires at least two groups of data along an axis. The X-axis is a single consecutive dimension, such as a date-time field, and the data lines are unlikely to cross. Y axis represents the metrics (measured value). X axis can also have non date-time categories. This visualization is mainly suitable for presenting accumulative value changes over time.
It can have a horizontal orientation where axis are switched i.e. y axis holds categories whereas x axis shows metrics.
What is a Stacked Bar Type of Visualization?
A Stacked visualization displays sets of values stacked in a single segmented column instead of side-by-side in separate columns. It is used to show a composition. Bars for each set of data are appended to previous sets of data. The size of the stack represents a cumulative data total.
What is a Spatial Type of Visualization?
Geo Spatial visualization allows displaying location of an object on a geo fence and takes user to the area where events are occurring. User can configure visualization to specify latitude, longitude, identifier etc. Customization of visualization by specifying different pins like arrows with different colors based on certain condition is also allowed.
What is a Pie Chart?
A pie chart is a circular graphic divided into slices that indicate numerical proportions. The arc length of each slice is proportionate to the quantity it represents.
You can use Pie charts to compare parts of a whole.
What is a Bubble Chart?
A bubble chart displays three dimensions of data. Each entity with its several versions (mostly three) of associated data is plotted as a disk. This disk shows two of the vi values through the disk's xy location and the third through its size.
What is a Thematic Map?
A thematic map focuses on a specific theme or subject area. It includes some locational or reference information and emphasizes spatial variation of one or a small number of geographic distributions. These distributions may be physical phenomena such as climate, population density, traffic congestion, and so on.
What is a Scatter Chart?
A scatter chart is a mathematical diagram that uses Cartesian coordinates to display values for multiple variables for a set of data. You can color code the points to display data for an additional variable. This chart shows how much one variable is affected by another. The relationship between two variables is called their correlation.
Understanding the Topology Viewer
Topology is a graphical representation and illustration of the connected entities and the dependencies between the artifacts.
What is Immediate Family?
Immediate Family context displays the dependencies between the selected entity and its child or parent.
The following figure illustrates how a topology looks in the Immediate Family.
Description of the illustration topology_viewer_immediate.png
What is Extended Family?
Extended Family context displays the dependencies between the entities in a full context, that is if an entity has a child entity and a parent entity, and the parent entity has other dependencies, all the dependencies are shown in the Full context.
The following figure illustrates how a topology looks in the Extended Family.
Understanding Expression Builder Functions
This topic applies only to Oracle user-managed services.
Topics:
What are Bessel Functions?
The mathematical cylinder functions for integers are known as Bessel functions.
Function Name | Description |
---|---|
|
Returns the modified Bessel function of order 0 of the double argument as a double |
|
Returns the exponentially scaled modified Bessel function of order 0 of the double argument as a double |
|
Returns the modified Bessel function of order 1 of the double argument as a double |
|
Returns the exponentially scaled modified Bessel function of order 1 of the double argument as a double |
|
Returns the Bessel function of the first kind of order n of the argument as a double |
|
Returns the modified Bessel function of the third kind of order n of the argument as a double |
|
Returns the exponentially scaled modified Bessel function of the third kind of order 0 of the double argument as a double |
|
Returns the exponentially scaled modified Bessel function of the third kind of order 1 of the double argument as a double |
|
Returns the Bessel function of the second kind of order n of the double argument as a double |
What are Conversion Functions?
The conversion functions help in converting values from one data type to other.
The following conversion functions are supported in this release:
Function Name | Description |
---|---|
|
Converts the given value to bigdecimal |
|
Converts the given value to logical |
|
Converts the given value to datetime |
|
Converts the given value to double |
|
Converts the given value to float |
|
Converts the given value to integer |
|
Converts the given value to long |
|
Converts the given value to string |
What are Date Functions?
The following date functions are supported in this release:
Function Name | Description |
---|---|
|
Returns day of the date |
|
Returns event timestamp from stream |
|
Returns hour of the date |
|
Returns minute of the date |
|
Returns month of the date |
|
Returns nanosecond of the date |
|
Returns second of the date |
|
Returns the system’s timestamp on which the application is running |
|
Returns the provided timestamp in required time format |
|
Returns the current output time |
|
Returns year of the date |
What are Geometry Functions?
The Geometry functions allow you to convert the given values into a geometrical shape.
The following interval functions are supported in this release:
Function Name | Description |
---|---|
|
Returns a 2–dimensional point type geometry from the given latitude and longitude. The default SRID is 8307. The return value is of the datatype |
|
Returns distance between the first set of latitude, longitude and the second set of latitude, longitude values. The default SRID is 8307. The return value is of the datatype |
What are Interval Functions?
The Interval functions help you in calculating time interval from given values.
The following interval functions are supported in this release:
Function Name | Description |
---|---|
|
Converts the given value to an The return value is of the datatype |
|
Converts a string in format The return value is of the datatype |
The YM Interval Functions
The YM Interval functions help you in calculating time interval from year to month.
The following are the YM interval functions:
Function Name | Description |
---|---|
|
In the SQL format, years is an integer between In the ISO format, years and months are integers between |
|
NUMTOYMINTERVAL converts number n to an The return value is of the datatype |
What are Math Functions?
The math functions allow you to perform various mathematical operations and calculations ranging from simple to complex.
The following math functions are supported in this release:
Function Name | Description |
---|---|
|
Computes the remainder operation on two arguments as prescribed by the IEEE 754 standard |
|
Returns the absolute value of a number |
|
Returns arc cosine of a value |
|
Returns arc sine of a value |
|
Returns arc tangent of a value |
|
Returns polar angle of a point ( |
|
Returns binomial coefficient of the base raised to the specified power |
|
BitMask with BitsSet (From, To) |
|
Returns cubic root of the specified value |
|
Rounds to ceiling |
|
Returns the first floating-point argument with the sign of the second floating-point argument |
|
Returns cosine of a value |
|
Returns cosine hyperbolic of a value |
|
Returns exponent of a value |
|
More precise equivalent of |
|
Returns factorial of a natural number |
|
Rounds to floor |
|
Returns the unbiased exponent used in the representation of a double |
|
Returns a deterministic seed as an integer from a (seemingly gigantic) matrix of predefined seeds |
|
Returns an integer hashcode for the specified double value |
|
Returns square root of sum of squares of the two arguments |
|
Returns the least significant 64 bits of this UUID's 128 bit value |
|
Calculates the log value of the given argument to the given base, where |
|
Returns the natural logarithm of a number |
|
Calculates the log value of the given argument to base 10 |
|
Calculates the log value of the given argument to base 2 |
|
Returns the natural logarithm (base e) of the factorial of its integer argument as a double |
|
Returns the factorial of its integer argument (in the range k >= 0 && k < 21) as a long |
|
Returns the maximum of 2 arguments |
|
Returns the minimum of 2 arguments |
|
Returns modulo of a number |
|
Returns the most significant 64 bits of this UUID's 128 bit value |
|
Returns the floating-point number adjacent to the first argument in the direction of the second argument |
|
Returns the floating-point value adjacent to the input argument in the direction of negative infinity |
|
Returns the floating-point value adjacent to the input argument in the direction of positive infinity |
|
Returns m raised to the nth power |
|
Returns the double value that is closest in value to the argument and is equal to a mathematical integer |
|
Rounds to the nearest integral value |
|
Returns d × 2scaleFactor rounded as if performed by a single correctly rounded floating-point multiply to a member of the double value set |
|
Returns signum of an argument as a double value |
|
Returns sine of a value |
|
Returns sine hyperbolic of a value |
|
Returns square root of a value |
|
Returns the correction term of the Stirling approximation of the natural logarithm (base e) of the factorial of the integer argument as a double |
|
Returns tangent of a value |
|
Returns tangent hyperbolic of a value |
|
Converts the argument value to degrees |
|
Returns the measurement of the angle in radians |
|
Returns the size of an ulp of the argument |
What are Null-related Functions?
The following null-related functions are supported in this release:
Function Name | Description |
---|---|
|
Replaces null with a value of the same type |
What are Statistical Functions?
Statistical functions help you in calculating the statistics of different values.
The following statistical functions are supported in this release:
Function Name | Description |
---|---|
|
Returns the area from zero to |
|
Returns the area under the right hand tail (from |
|
Returns the sum of the terms 0 through |
|
Returns the sum of the terms |
|
Returns the area under the left hand tail (from 0 to |
|
Returns the area under the right hand tail (from |
|
Returns the error function of the normal distribution |
|
Returns the complementary error function of the normal distribution |
|
Returns the gamma function of the arguments |
|
Returns the integral from |
|
Returns the incomplete beta function evaluated from zero to |
|
Returns the incomplete gamma function |
|
Returns the complemented incomplete gamma function |
|
Returns the natural logarithm of the gamma function |
|
Returns the sum of the terms 0 through |
|
Returns the sum of the terms |
|
Returns the area under the normal (Gaussian) probability density function, integrated from minus infinity to |
|
Returns the value for which the area under the normal (Gaussian) probability density function is equal to the argument |
|
Returns the sum of the first |
|
Returns the sum of the terms |
|
Returns the integral from minus infinity to |
|
Returns the value, for which the area under the Student-t probability density function is equal to |
What are String Functions?
The following String functions are supported in this release:
Function Name | Description |
---|---|
|
Returns the first non-null expression in the list. If all expressions evaluate to null, then the COALESCE function will return null |
|
Returns concatenation of values converted to strings |
|
Returns first index of |
|
Returns a specified text expression, with the first letter of each word in uppercase and all other letters in lowercase |
|
Returns the length of the specified string |
|
Returns a matching pattern |
|
Converts the given string to lower case |
|
Pads the left side of a string with a specific set of characters (when |
|
Removes all specified characters from the left hand side of a string |
|
Replaces all |
|
Pads the right side of a string with a specific set of characters (when |
|
Removes all specified characters from the right hand side of a string |
|
Returns substring of a 'string' when indices are between 'from' (inclusive) and up to the end of the string |
|
Returns substring of a \'string\' when indices are between \'from\' (inclusive) and \'to\' (exclusive) |
|
Replaces a sequence of characters in a string with another set of characters. However, it replaces a single character at a time. |
|
Converts given string to uppercase |