Working with Oracle Stream Analytics

About the Catalog

The Catalog page is the location where resources including pipelines, streams, references, maps, connections, targets, dashboards, predictive models, custom jars, visualizations, and cubes are listed. This is the go-to place for you to perform any tasks in Oracle Stream Analytics.

You can mark a resource as a favorite in the Catalog by clicking on the Star icon. Click the icon again to remove it from your favorites. You can also delete a resource or view its topology using the menu icon to the right of the favorite icon.

The tags applied to items in the Catalog are also listed on the screen below the left navigation pane. You can click any of these tags to display only the items with that tag in the Catalog. The tag appears at the top of the screen. Click Clear All at the top of the screen to clear the Catalog and display all the items.

You can include or exclude pipelines, streams, references, predictive models, geo fences, connections, targets, custom jars, visualizations, dashboards, and cubes using the View All link in the left panel under Show Me. When you click View All, a check mark appears beside it and all the components are displayed in the Catalog.

When you want to display or view only a few or selective items in the Catalog, deselect View All and select the individual components. Only the selected components will appear in the Catalog.

Typical Workflow for Administering Oracle Stream Analytics

The typical workflow lists the artifacts required to create a pipeline in Oracle Stream Analytics.

The prerequisites for a pipeline are:

A connection is required to create a stream, except for a file stream.
A stream is required to create a pipeline.

Creating a Connection

To create a connection:

Click Catalog in the left pane.
From the Create New Item menu, select Connection.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the connection
- Description — description of the connection
- Tags — tags you want to use for the connection
- Connection Type — type of connection: Coherence, Database, Druid, JNDI, or Kafka
Description of the illustration create_connection_type.png
Enter Connection Details on the next screen and click Save.
When the connection type is Coherence:
- Host name — the Coherence Extend Proxy Services TCP/IP Server Socket host
- Port — the Coherence Extend Proxy Services TCP/IP Server Socket port
When the connection type is Database:
- Connect using — select the way you want to identify the database; SID or Service name
- Service name/SID — the details of the service name or SID
- Host name — the host name on which the database is running
- Port — the port on which the database is running. Usually it is 1521
- Username — the user name with which you connect to the database
- Password — the password you use to login to the database
When the connection type is Druid, provide Zookeeper URL.

When the connection type is JNDI:
- JNDI Provider — select the JNDI service provider
- Server Url(s) — the server url(s) for the JNDI connection; for example: host1:port1, host2:port2
- Username — the user name for authenticating the JNDI connection
- Password — the password for the JNDI connection
When the connection type is Kafka, provide Zookeeper URL.

A connection with the specified details is created.

Cache Configuration for Coherence

Oracle Stream Analytics requires a special coherence cache configuration and the proxy schema, so that it can connect to the coherence.

To enrich stream data with external coherence cluster reference data, you must access external coherence cluster using extend client APIs. To access external cluster as client, you need to configure cache-config with ExtendTcpCacheService and ExtendTcpInvocationService.

Configure the Coherence Cluster

Make sure that you have Coherence for Java is installed.

To configure the external cluster as client:

Create an XML file named cache-config.xml.

Copy the following XML to the file:

<?xml version="1.0"?>

<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
   xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config
   coherence-cache-config.xsd">
		<caching-scheme-mapping>
			<cache-mapping>
				<cache-name>
					externalcache*
				</cache-name>
				<schema-name>
					remote
				</schema-name>
			</cahce-mapping>
		</caching-scheme-mapping>

		<caching-schemes>
			<remote-cache-scheme>
				<scheme-name>
					remote
				</scheme-name>
				<service-name>
					ExtendTcpCacheService
				</service-name>
				<initiator-config>
					<tcp-initiator>
						<remote-addresses>
							<socket-address>
								<address>localhost	</address>
								<port>9099</port>
							</socket-address>
						</remote-addresses>
					</tcp-initiator>
					<outgoing-message-handler>
						<request-timeout>5s</request-timeout>
					</outgoing-message-handler>
				</initiator-config>
			</remote-cache-scheme>

			<remote-invocation-scheme>
				<scheme-name>extend-invocation</scheme-name>
				<service-name>ExtendTcpInvocationService</service-name>
				<initiator-config>
					<tcp-initiator>
						<remote-addresses>
							<socket-address>
								<address>localhost</address>
								<port>9099</port>
							</socket-address>
						</remote-addresses>
					</tcp-initiator>
					<outgoing-message-handler>
						<request-timeout>5s</request-timeout>
					</outgoing-message-handler>
				</initiator-config>
			</remote-invocation-scheme>
		</caching-schemes>
</cache-config>

Save and close the file.
Test the connection to the cluster.
```
InvocationService service = (InvocationService) CacheFactory.getConfigurableCacheFactory().ensureService("ExtendTcpInvocationService");
```
ensureService() will throw exception if there is no coherence cluster available with the given host and port.
Create a coherence reference using a coherence connection.
Register the coherence as reference.

The following is the sample code to register the coherence as reference:

override def initialize():Unit = {
    repartition = true
    val externalEvent = EventType("externalorders",IntAttr("orderId"), VarCharAttr("orderDesc", 20))
    val sExtSrcProps = Map(EXT_URL -> "",EXT_ENTITY_NAME -> "externalcache")
    val jExtSrcProps = new java.util.HashMap[String,String](sExtSrcProps)
    val converter = ConverterFactory(ConverterType.COHERENCE,externalEvent)
    cc.registerEventType(externalEvent)
    cc.registerRelation(externalEvent).onExternal(jExtSrcProps,ExtSourceType.COHERENCE,converter)
 }

def main(args: Array[String]) {
    cql = "istream(select R.orderId as orderId, R.orderStatus as orderStatus, Ext.orderDesc as orderDesc from orders[now] as R, externalorders as Ext where R.orderId = Ext.orderId)"
    name = "CoherenceCorrelation"
    processOrders(args)
    }
}
// EXT_URL is not used for coherence as reference , currently used for webservice & database, so this will be set to EMPTY
//EXT_ENTITY_NAME is the cache name of the external coherence cluster

For the above example, coherence cache must have key as orderId <Integer> and value as Map of values for orderId and orderDesc. A sample cache similar to the following will populate:

NamedCache cache = CacheFactory.getCache("externalcache"); 
Map<String,Object> order1 = new HashMap<String, Object>();
order1.put("orderId", new Integer(1)); 
order1.put("orderDesc", "HP Deskjet v2"); 
Map<String,Object> order2 = new HashMap<String, Object>(); 
order2.put("orderId", new Integer(2)); 
order2.put("orderDesc", "Oracle Database 12"); 
MapString,Object> order3 = new HashMap<String, Object>();
order3.put("orderId", new Integer(3)); 
order3.put("orderDesc", "Apple iPhone6s"); 
Map<String,Object> order4 = new HashMap<String, Object>();
order4.put("orderId", new Integer(4)); 
order4.put("orderDesc", "Logitech Mouse"); 
cache.put(1,order1); 
cache.put(2,order2); 
cache.put(3,order3); 
cache.put(4,order4);

Creating a Stream

A stream is a source of events with a given content (shape).

To create a stream:

Navigate to Catalog.
Select Stream in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the stream
- Description — description of the stream
- Tags — tags you want to use for the stream
- Stream Type — select suitable stream type. Supported types are File, GoldenGate, JMS, and Kafka.
Description of the illustration create_stream_type.png
Provide details for the following fields on the Source Details page and click Next:

When the stream type is File:
- File Path or URL — the location of the file that you want to upload
- Read whole content — select this option if you want to read the whole content of the file
- Number of events per batch — the number of events that you want to process per batch
- Loop — select this option if you want to process the file in a loop
- Data Format — the format of the data. The supported types are: CSV and JSON.
When the stream type is GoldenGate:
- Connection — the connection for the stream
- Topic name — the topic name that receives events you want to analyze
- Data Format — the format of the data. The supported types are: CSV, JSON, AVRO. AVRO is a data serialization system.
When the stream type is JMS:
- Connection — the connection for the stream
- Jndi name — the Jndi that reads messages from topics, distributed topics, queues and distributed queues
- Client ID — the client to be used for durable subscriber
- Message Selector — the message selector to filter messages. If your messaging application needs to filter the messages it receives, you can use a JMS API message selector, which allows a message consumer to specify the messages it is interested in. Message selectors assign the work of filtering messages to the JMS provider rather than to the application.
  
  A message selector is a String that contains an expression. The syntax of the expression is based on a subset of the SQL92 conditional expression syntax. The message selector in the following example selects any message that has a NewsType property that is set to the value 'Sports' or 'Opinion':
```
NewsType = ’Sports’ OR NewsType = ’Opinion’
```
  The createConsumer and createDurableSubscriber methods allow you to specify a message selector as an argument when you create a message consumer.
- Subscription ID — the subscription id for durable selector
- Data Format — the format of the data. The supported types are: CSV, JSON, AVRO, MapMessage. MapMessage is supported only for JNDI based streams.
  
  A MapMessage object is used to send a set of name-value pairs. The names are String objects, and the values are primitive data types in the Java programming language. The names must have a value that is not null, and not an empty string. The entries can be accessed sequentially or randomly by name. The order of the entries is undefined.
When the stream type is Kafka:
- Connection — the connection for the stream
- Topic name — the topic name that receives events you want to analyze
- Data Format — the format of the data within the stream. The supported types are: CSV, JSON, AVRO.
Description of the illustration create_stream_source.png
Select one of the mechanisms to define the shape on the Shape page:
- Infer Shape — detects the shape automatically from the input data stream.
  
  You can infer the shape from Kafka, JSON schema file, or CSV message/data file. You can also save the auto detected shape and use it later.
- Select Existing Shape — lets you choose one of the existing shapes from the drop-down list.
- Manual Shape — populates the existing fields and also allows you to add or remove columns from the shape. You can also update the datatype of the fields.
Description of the illustration create_stream_shape.png

A stream is created with specified details.

CSV Data for Pre-defined Formats

When your data format is CSV, select a predefined format based on the variations of CSV data that differs due to the originating source of these CSV. The following table describes the CSV data for each of these predefined formats:

CSV Predefined Format	Description
`DEFAULT`	Standard comma separated format, as for `RFC4180` but allowing empty lines
`EXCEL`	Excel file format (using a comma as the value delimiter).
`INFORMIX_UNLOAD_CSV`	Default `Informix CSV UNLOAD` format used by the `UNLOAD TO file_name` operation (escaping is disabled.) This is a comma-delimited format with a LF character as the line separator. Values are not quoted and special characters are escaped with '\'. The default NULL string is "\\N".
`MYSQL`	Default `MySQL` format used by the `SELECT INTO OUTFILE` and `LOAD DATA INFILE` operations. This is a tab-delimited format with a LF character as the line separator. Values are not quoted and special characters are escaped with '\'. The default NULL string is "\\N".
`POSTGRESQL_CSV`	Default `PostgreSQL CSV` format used by the `COPY` operation. This is a comma-delimited format with a LF character as the line separator. The default NULL string is "".
`POSTGRESQL_TEXT`	Default `PostgreSQL` text format used by the `COPY` operation. This is a tab-delimited format with a LF character as the line separator. The default NULL string is "\\N".
`RFC4180`	Comma separated format as defined by `RFC4180`
`TDF`	Tab-delimited format

Capabilities of JMS Source

The capabilities of JMS Source are listed in the following table:

Capability	Description	Comments
Ability to connect to JMS Cluster	JMS consumer should be able to connect to JMS cluster and handle JMS server fail-over
Message Format support	Map and TextMessage (JSON, CSV and AVRO)	Does not support xml and object
Message selector	JMS message selector to use to filter messages. Only messages that match the selector will produce events.
Re-connection	Reconnect to JMS server or JMS cluster
Read messages from queue/distributed queue
Read messages from topic	Read messages from JMS topic. By default the subscriber is non-durable
Support for Durable subscriber	A durable subscriber registers a durable subscription by specifying a unique identity that is retained by the JMS provider. If the consumer reconnects to JMS topic, it would read messages from where it last read.
T3 Support	Weblogic JMS Protocol

JMS Server Clean Up

When you create a JMS stream and select the durable subscription option (by providing client ID and subscription ID value), Oracle Stream Analytics creates the durable subscription (if not already present) when the pipeline using this stream is running. When you come out of the pipeline or unpublish the pipeline(or kill the running pipeline), the durable subscription remains on the JMS Server. It is advisable to delete the durable subscription from the JMS Server and clean up the resources, if you do not intend to publish this pipeline anymore.

Creating a Reference

The reference defines a read-only source of reference data to enrich a stream. A stream containing a customer name could use a reference containing customer data to add the customer’s address to the stream by doing a lookup using the customer name.

A database reference is a reference to specified table in the database. With cache enabled for database reference, when the values gets pulled from database, they are maintained in coherence cache for reference from where they can be served on next request. A database reference requires a database connection.

A coherence reference can be any external cache defined in coherence cluster that can have data from an external system.

To create a reference:

Navigate to Catalog.
Select Reference in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Next:
- Name — name of the reference
- Description — description of the reference
- Tags — tags you want to use for the reference
- Reference Type — the reference type of the reference. The supported reference types are: Coherence and Database.
  
  Description of the illustration create_reference_type.png
Provide details for the following fields on the Source Details page and click Next:

When the reference type is Coherence, enter or select appropriate values for:
- Connection — the connection for the coherence reference
  
  Description of the illustration create_reference_source.png
- Cache name — the name of the cache to enable caching for better performance at the cost of higher memory usage of the Spark applications. Caching is supported only for single equality join condition. When you update the cache, the application will receive the updated data very quickly.
Coherence reference has data in key-value pairs. Key is object type and value is Map<String,Object>. Map<String,Object> is map of attribute names and values, attributes list should match with external event type. In this release, only external schema for key and value s supported.

When the reference type is Database Table, enter or select appropriate values for:
- Connection — the connection for the database reference
- Enable Caching — select this option if you want to enable caching
- Expiry Delay — the amount of time from last update that entries will be kept by the cache before being marked as expired. Any attempt to read an expired entry will result in a reloading of the entry from the configured cache store. This field is enabled only when caching is enabled.
Provide details for the following fields on the Shape page and click Save:

When the reference type is Coherence:
- Select Existing Shape — select a shape that you want to use for the reference
  
  Remember:
  Ensure that you do not use any of the CQL reserved words as the column names. If you use the reserved keywords, you cannot deploy the pipeline.
- Manual Shape — select this option if you want to define your own shape
Note:
When you load coherence data, ensure that you include precision and scale for number type. Only when these values are specified, the join works. For example,
```
NamedCache cache  = CacheFactory.getCache("externalcachetimestamp");

        java.math.BigDecimal big10 = new java.math.BigDecimal("10",new
MathContext(58)).setScale(56, RoundingMode.HALF_UP);

        Map<String,Object> order1 = new HashMap<String, Object>();
order1.put("strValue", "Test");
order1.put("intervalValue", "+000000002 03:04:11.330000000");
        order1.put("orderTag", big10);

        cache.put(big10,order1);
```
When the reference type is Database Table:
- Shape Name — select a shape that you want to use for the reference

When the datatype of the table data is not supported, the table columns do not have auto generated datatype. Only the following datatypes are supported:

numeric
interval day to second
text
timestamp (without timezone)
date time (without timezone)

Note:
The date column cannot be mapped to timestamp. This is a limitation in the current release.

A reference is created with the specified details.

Limitations of Coherence as Reference

With coherence as reference, there are a few limitations:

You cannot test the connection
You need to specify the cache name manually
Only equal operator is allowed while establishing a correlation with coherence reference
You must use manual shape

Creating a Dashboard

Dashboard is a visualization tool that helps you look at and analyze the data related to a pipeline based on various metrics like visualizations. A dashboard can have visualizations created out of cubes as well.

Dashboard is an analytics feature. You can create dashboards in Oracle Stream Analytics to have a quick view at the metrics.

To create a dashboard:

Go to the Catalog.
Select Dashboard in the Create New Item menu.

The Create Dashboard screen appears.

Description of the illustration create_dashboard.png
Provide suitable details for the following fields:
- Name — enter a name for the dashboard. this is a mandatory field.
- Description — enter a suitable description for the dashboard. This is an optional field.
- Tags — enter or select logical tags to easily identify the dashboard in the catalog. This is an optional field.
Click Next.
Enter a custom stylesheet for the dashboard. This is an optional step.
Click Save.
You can see the dashboard in the Catalog.

After you have created the dashboard, it is just an empty dashboard. You need to start adding visualizations to the dashboard.

Editing a Dashboard

To edit a dashboard:

Click the required dashboard in the catalog.

The dashboard opens in the dashboard editor.

Description of the illustration edit_dashboard.png
Click the Add a new visualization icon to see a list of existing visualizations. Visualizations from the pipelines and as well as from the cube explorations appear here. Go through the list, select one or more visualizations and add them to the dashboard.
Click the Specify refresh interval icon to select the refresh frequency for the dashboard. This is applicable only for cube based visualizations not applicable for streaming charts created out of pipeline.

This just a client side setting and is not persisted with the Superset Version 0.17.0.
Click the Apply CSS to the dashboard icon to select a CSS. You can also edit the CSS in the live editor.

You can also see the active filter applied to the dashboard by clicking the Active dashboard filters icon. You can save the link to the dashboard or email the link to someone using the Copy the link to the clipboard and Email the link icons respectively.
Click the Save icon to save the changes you have made to the dashboard.
Hover over the added visualization, click the Explore chart icon to open the chart editor of the visualization.

Description of the illustration explore_chart.png

You can see the metadata of the visualization. You can also move the chart around the canvas, refresh it, or remove it from the dashboard.

A cube exploration looks like the following:

Description of the illustration cube_exploration.png

The various options like time granularity, group by, table timestamp format, row limit, filters, and result filters add more granularity and details to the dashboard.
Click Save as to make the following changes to the dashboard:
- Overwrite the visualization
- Overwrite the current visualization with a different name
- Add the visualization to an existing dashboard
- Add the visualization to a new dashboard

Creating a Cube

Cube is a data structure that helps in quickly analyzing the data related to a business problem on multiple dimensions.

To create a cube:

Go to the Catalog.
From the Create New Item menu, select Cube.
On the Create Cube — Type Properties screen, provide suitable details for the following fields:
- Name — enter a name for the cube. This is a mandatory field.
  Make sure that the names you use for the underlying sources for the cube like Pipeline Name, Druid Connection, and Kafka Target use names that contain alphanumeric, hyphen, and underscore characters.
- Description — enter a suitable description for the cube. This is an optional field.
- Tags — enter or select logical tags for the cube. This is an optional field.
- Source Type — select the source type from the drop-down list. Currently, Published Pipeline is the only supported type. This is a mandatory field.
Description of the illustration create_cube_type.png
Click Next and provide suitable details for the following fields on the Ingestion Details screen:
- Connection — the connection for the cube. This is a mandatory field.
- Pipelines — select a pipeline to be used as the base for the cube. This is a mandatory field.
- Kafka Target — the Kafka target for the cube. This is a mandatory field.
- Timestamp — select a column from the pipeline to be used as the timestamp. This is a mandatory field.
- Timestamp format — select or set a suitable format for the timestamp using Joda time format. This is a mandatory field. auto is the default value.
- Metrics — select metrics for creating measures
- Dimensions — select dimensions for group by
- High Cardinality Dimensions — high cardinality dimensions such as unique IDs. Hyperlog approximation will be used.
Description of the illustration cube_ingestion.png
Click Next and select the required values for the Metric on the Metric Capabilities screen.

Description of the illustration cube_metric_details.png
Click Next and make any changes, if required, on the Advanced Settings screen.
- Segment granularity — select the granularity with which you want to create segments
- Query granularity — select the minimum granularity to be able to query results and the granularity of the data inside the segment
- Task count — select the maximum number of reading tasks in a replica set. This means that the maximum number of reading tasks is taskCount*replicas and the total number of tasks (reading + publishing) is higher than this. The number of reading tasks is less than taskCount if taskCount > {numKafkaPartitions}.
- Task duration — select the length of time before tasks stop reading and begin publishing their segment. The segments are only pushed to deep storage and loadable by historical nodes when the indexing task completes.
- Maximum rows in memory — enter a number greater than or equal to 0. This number indicates the number of rows to aggregate before persisting. This number is the post-aggregation rows, so it is not equivalent to the number of input events, but the number of aggregated rows that those events result in. This is used to manage the required JVM heap size. Maximum heap memory usage for indexing scales with maxRowsInMemory*(2 + maxPendingPersists).
- Maximum rows per segment — enter a number greater than or equal to 0. This is the number of rows to aggregate into a segment; this number is post-aggregation rows.
- Immediate Persist Period — select the period that determines the rate at which intermediate persists occur. This allows the data cube is ready for query earlier before the indexing task finishes.
- Report Parse Exception — select this option to throw exceptions encountered during parsing and halt ingestion.
- Advanced IO Config — specify name-value pair in a CSV format. Available configurations are replicas, startDelay, period, useEarliestOffset, completionTimeout, and lateMessageRejectionPeriod.
- Advanced Tuning Config — specify name-value pair in CSV format. Available configurations are maxPendingPersists, handoffConditionTimeout, resetOffsetAutomatically, workerThreads, chatThreads, httpTimeout, and shutdownTimeout.
Description of the illustration cube_advanced_settings.png
Click Save to save the changes you have made.

You can see the cube you have created in the catalog.

Exploring a Cube

When you create druid based cube, you can explore data in it.

To explore a cube:

In the Catalog, click the cube that you want to explore.
The Cube Exploration canvas appears.

Construct a query by setting the various parameters.

Visualization Type — the type of visualization to be used for displaying data. The supported visualizations are:

Distribution - Bar Chart

Separator

Sunburst

Pie Chart

World Cloud

Sankey

Time Series - Line Chart

Treemap

Directed force Layout

Time Series - Dual Axis Line Chart

Calendar Heatmap

World Map

Time Series - Bar Chart

Box Plot

Filter Box

Time Series - Percent Change

Bubble Chart

iFrame

Time Series - Stacked

Bullet Chart

Streaming Chart

Table View

Big Number with Trendline

Parallel Coordinates

Markup

Big Number

Heatmap

Pivot Table

Histogram

Horizon

Time — time related form attributes like time granularity, origin (starting point of time), and time range
Group By — parameters to aggregate the query data
Not Grouped By — parameter to query atomic rows
Options
Filters — columns that you can use in filters
Result Filters — columns that you can use in result filters

Description of cube_exploration.png follows
Description of the illustration cube_exploration.png

Click Query to run the query with the defined parameters.
Click Save As to save the cube exploration. You can save it as a visualization, choose to add it to an existing dashboard, not to add to a dashboard, or to a new dashboard.

Creating a Target

The target defines a destination for output data coming from a pipeline.

To create a target:

Navigate to Catalog.
Select Target in the Create New Item menu.
Provide details for the following fields on the Type Properties page and click Save and Next:
- Name — name of the target
- Description — description of the target
- Tags — tags you want to use for the target
- Target Type — the transport type of the target. Supported types are JMS, Kafka and Rest. The target is a sink for the output event. Each type of target is a different sink system and therefore different configuration parameters are required for different types.
  
  Description of the illustration create_target_type.png
Provide details for the following fields on the Target Details page and click Next:

When the target type is JMS:
- Connection — the connection for the target
- Jndi name — the topic or queue name defined in Jndi to be used in the target
  
  Description of the illustration create_target_details.png
- Data Format — select a suitable data format. This is a mandatory field. The supported data format types are: CSV and JSON.
When the target type is Kafka:
- Connection — the connection for the target
- Topic Name — the Kafka topic to be used in the target
- Data Format — select a suitable data format. This is a mandatory field. The supported data format types are: CSV and JSON.
When the target type is REST:
- URL — enter the REST service URL. This is a mandatory field.
- Custom HTTP headers — set the custom headers for HTTP. This is an optional field.
- Batch processing — select this option to send events in batches and not one by one. Enable this option for high throughput pipelines. This is an optional field.
- Data Format — select a suitable data format. This is a mandatory field.
Click Test connection to check if the connection has been established successfully.

Testing REST targets is a heuristic process. It uses proxy settings. The testing process uses GET request to ping the given URL and returns success if the server returns OK (status code 200). The return content is of the type of application/json.
Provide details for the following fields on the Data Format page and click Next:

When the data format type is CSV:
- CSV Predefined Format — select a predefined CSV format. This supported formats are: Excel, InfomixUnload, InfomixUnloadCsv, MySQL, PostgreSQLCsv, PostgreSQLText.
- Create the header row — select this option if you want to create a header row in the target.
When the data format type is JSON:
- Create nested json object — select this option if you want a nested json object to be created for the target
  
  Description of the illustration create_target_dataformat.png
Select one of the mechanisms to define the shape on the Shape page and click Save:
- Select Existing Shape lets you choose one of the existing shapes from the drop-down list.
- Manual Shape populates the existing fields and also allows you to add or remove columns from the shape. You can also update the datatype of the fields.
  
  Description of the illustration create_target_shape.png

A target is created with specified details.

Creating Target from Pipeline Editor

Alternatively, you can also create a target from the pipeline editor. When you click Create in the target stage, you are navigated to the Create Target dialog box. Provide all the required details and complete the target creation process. When you create a target from the pipeline editor, the shape gets pre-populated with the shape from the last stage.

Creating a Geo Fence

Geo fences are further classified into two categories: manual geo fence and database-based geo fence.

Create a Manual Geo Fence

To create a manual geo fence:

Navigate to the Catalog page.
Click Create New Item and select Geo Fence from the drop-down list.

The Create Geo Fence dialog opens.
Enter a suitable name for the Geo Fence.
Select Manually Created Geo Fence as the Type.
Click Save.

The Geo Fence Editor opens. In this editor you can create the geo fence according to your requirement.
Within the Geo Fence Editor, Zoom In or Zoom Out to navigate to the required area using the zoom icons in the toolbar located on the top-left side of the screen.

You can also use the Marquee Zoom tool to zoom a specific area on the map. You can mark an area using the marquee zoom and that area in map is zoomed.
Click the Polygon Tool and mark the area around a region to create a geo fence.

Description of the illustration create_geo_fence.png
Enter a name and description, and click Save to save your changes.

Update a Manual Geo Fence

To update a manual geo fence:

Navigate to the Catalog page.
Click the name of the geo fence you want to update.

The Geo Fence Editor opens. You can edit/update the geo fence here.

Search Within a Manual Geo Fence

You can search the geo fence based on the country and a region or address. The search field allows you search within the available list of countries. When you click the search results tile in the left center of the geo fence and select any result, you are automatically zoomed in to that specific area.

Delete a Manual Geo Fence

To delete a manual geo fence:

Navigate to Catalog page.
Click Actions, then select Delete Item to delete the selected geo fence.

Create a Database-based Geo Fence

To create a database-based geo fence:

Navigate to Catalog page.
Click Create New Item and then select Geo Fence from the drop-down list.

The Create Geo Fence dialog opens.
Enter a suitable name for the geo fence.
Select Geo Fence from Database as the Type.
Click Next and select Connection.
Click Next.

All tables that have the field type as SDO_GEOMETRY appear in the drop-down list.
Select the required table to define the shape.
Click Save.

Note:

You cannot edit/update database-based geo fences.

Delete a Database-based Geo Fence

To delete a database-based geo fence:

Navigate to Catalog page.
Click Actions and then select Delete Item to delete the selected geo fence.

Display the Map Using Tile Layers

Tile layer is the base map that provides immediate geographic context. Tiles are stored in the map tile server. <ph ishcondition="Product_Family=Cloud" varref="streaming">Stream Analytics</ph><ph ishcondition="Product_Family=OnPremise" varref="osa">Oracle Stream Analytics</ph> supports two types of tile layers. Open Street Maps tile layer is a free map. And, Elocation tile layer is an Oracle tile layer. These tile layers contains huge amount of data pertaining to:

Roads, railways, waterways, etc.
Restaurants, shops, stations, ATMs, and more
Walking and cycling paths
Buildings, campuses, etc.

You can choose if you would like to see the map in Elocation tile layer or Open Street Maps tile layer. To set your preference:

Click the user name in the top right corner of the screen.
Click Preferences. The Preferences page opens.
Click Map.
Under Tile Layer, choose Open Street Maps Tile Layer option from the drop-down list.

Description of the illustration tilelayer_1.png
Click Save. The map looks like this:

Description of the illustration tilelayer_2.png
To display the map in Elocation tile layer, follow steps 1 to 3.
From the Tile Layer drop-down list, choose Elocation Tile Layer.
Click Save. The map looks like this:

Description of the illustration tilelayer_4.png

Creating a Predictive Model

To create a predictive model:

In the Create New Item menu, select Predictive Model (Beta).
The Create Predictive Model page opens.
Under Type Properties do the following and then click Next:
1. In the Name field, enter a meaningful name for your PMML model.
2. In the Predictive Model Type drop-down list, select PMML Model.
  
  Note:
  Only PMML Models up to version 4.1 are supported in this release.
Under Predictive Model Details, do the following and click Save:
1. For Predictive Model URL, upload your PMML file.
2. In the Model Version field, enter the version of this artifact. For example, 1.0.
3. (Optional) In the Version Description, enter a meaningful description for your PMML file.
4. In the Algorithm field, accept the default. The algorithm is derived from the PMML file you have uploaded.
5. (Optional) In the Tool drop-down list, select the tool with which you created your PMML file.
  
  Description of the illustration pmml_filledup.png

Your predictive model has been created. It is displayed in the Catalog if you have selected the Predictive Models option under Show Me.

Description of predictive_model.png follows
Description of the illustration predictive_model.png

Limited Support for Predictive Models

The menu commands for creating Predictive Models and Scoring Stages are marked Beta, for example, Predictive Model (Beta). The Beta label indicates that the functionality has been tested, but is not fully supported. The import and scoring of Predictive Models might contain undocumented limitations and you should use them as is.

Creating a Custom Jar

A custom jar is a user-supplied Jar archive containing Java classes for custom stage types or custom functions that will be used within a pipeline.

To create a custom jar:

In the Create New Item menu, select Custom Jar.
The Import a jar for custom stages and functions wizard appears.
On the Type Properties page, enter/select suitable values and click Next:
1. In the Name field, enter a meaningful name for the custom jar you are trying to import into the application.
2. In the Description field, provide a suitable description.
3. In the Tags field, select one or more of existing tags, or enter your own tags.
4. In the Custom Jar Type drop-down list, select Custom Jar.
Description of the illustration create_custom_jar.png
On the Custom Jar Details page, click Upload file, select the jar file that you want to import into the application, and then click Save.
Make sure that the jar file you select for uploading is a valid jar file and includes all the required dependencies.

Creating a Pipeline

A pipeline is a Spark application where you implement your business logic. It can have multiple stages such as a query stage, a pattern stage, a business rule stage, a query group stage, a custom stage and many more.

To create a pipeline:

Navigate to Catalog.
Select Pipeline in the Create New Item menu.
Provide details for the following fields and click Save:
- Name — name of the pipeline
- Description — description of the pipeline
- Tags — tags you want to use for the pipeline
- Stream — the stream you want to use for the pipeline
  
  Description of the illustration create_application.png

A pipeline is created with specified details.

Configuring a Pipeline

You can configure the pipeline to use various stages like query, pattern, rules, query group, scoring, and custom stage from custom jars.

Pipeline Editor

The canvas on which you edit/update a pipeline and add different stages to the pipeline is called Pipeline Editor.

The pipelines in Oracle Stream Analytics can vary from being very simple to highly complex. Complex pipelines have various stages branching out from each/any stage of the pipeline. In other words, you can add any type of stage to any of the existing stage in the pipeline.

You can delete any stage that does not have any children without breaking the pipeline. You can expand/collapse a pipeline, switch the layout of the pipeline to vertical or horizontal, and zoom in or zoom out the pipeline. You can adjust the pipeline pane, editor pane, and the live output table pane using the resizing arrows.

Description of expand_collapse_pipeline.png follows
Description of the illustration expand_collapse_pipeline.png

The pipeline editor allows you to see the relationship and dependencies between various stages of the pipeline.

Working with Live Output Table

The streaming data in the pipeline appears in a live output table.

Hide/Unhide Columns

In the live output table, right-click columns and click Hide to hide that column from the output. To unhide the hidden columns, click Columns and then click the eye icon to make the columns visible in the output.

Select/Unselect the Columns

Click the Columns link at the top of the output table to view all the columns available. Use the arrow icons to either select or unselect individual columns or all columns. Only columns you select appear in the output table.

Pause/Restart the Table

Click Pause/Resume to pause or resume the streaming data in the output table.

Perform Operations on Column Headers

Right-click on any column header to perform the following operations:

Hide — hides the column from the output table. Click the Columns link and unhide the hidden columns.
Remove from output — removes the column from the output table. Click the Columns link and select the columns to be included in the output table.
Rename — renames the column to the specified name.
Function — captures the column in Expression Builder using which you can perform various operations through the in-built functions.

Add a Timestamp

Include timestamp in the live output table by clicking the clock icon in the output table.

Reorder the Columns

Click and drag the column headers to right or left in the output table to reorder the columns.

Adding a Query Stage

You can include simple or complex queries on the data stream without any coding to obtain refined results in the output.

Open a pipeline in the Pipeline Editor.
Right-click the stage after which you want to add a query stage, click Add a Stage, and then select Query.
Enter a Name and Description for the Query Stage.
Click Save.

Adding and Correlating Sources and References

You can correlate sources and references in a pipeline.

To add a correlating source or reference:

Open a pipeline in the Pipeline Editor.
Select the required query stage.
Click the Sources tab.
Click Add a Source.
Select a source (stream or reference) from the available list.
Click the Window Area in the source next to the clock icon and select appropriate values for Range and Evaluation Frequency.
Under Correlation Conditions, select Match All or Match Any as per your requirement. Then click Add a Condition.
Select the fields from the sources and the appropriate operator to correlate.
Ensure that the fields you use on one correlation line are of compatible types. The fields that appear in the righ drop-down list depend on the field you select in the left drop-down list.
Repeat these steps for as many sources or references as you want to correlate.

Adding Filters

You can add filters in a pipeline to obtain more accurate streaming data.

To add a filter:

Open a pipeline in the Pipeline Editor.
Select the required query stage.
Navigate to the Filters tab.
Click Add a Filter.
Select the required column and a suitable operator and value.

You can also calculated fields within filters.
Click Add a Condition to add and apply a condition to the filter.
Click Add a Group to add a group to the filter.
Repeat these steps for as many filters, conditions, or groups as you want to add.

Adding Summaries

To add a summary:

Open a pipeline in the Pipeline Editor.
Select the required query stage and click the Summaries tab.
Click Add a Summary.
Select the suitable function and the required column.
Repeat the above steps to add as many summaries you want.

Adding Group Bys

To add a group by:

Open a pipeline in the Pipeline Editor.
Select the required query stage and click the Summaries tab.
Click Add a Group By.
Click Add a Field and select the column on which you want to group by.
A group by is created on the selected column.

When you create a group by, the live output table shows the group by column alone by default. Turn ON Retain All Columns to display all columns in the output table.

You can add multiple group bys as well.

Using the Expression Builder

You can perform calculations on the data streaming in the pipeline using in-built functions of the Expression Builder.

Oracle Stream Analytics supports various functions. For a list of supported functions, see Understanding Expression Builder Functions.

Note:

Currently, you can use expressions only within a query stage.

Adding a Constant Value Column

A constant value is a simple string or number. No calculation is performed on a constant value. Enter a constant value directly in the expression builder to add it to the live output table.

Description of expr_constant_value.png follows
Description of the illustration expr_constant_value.png

Using Functions

You can select a CQL Function from the list of available functions and select the input parameters. Make sure to begin the expression with =”. Click Apply to apply the function to the streaming data.

Description of list_of_functions.png follows
Description of the illustration list_of_functions.png

You can see custom functions in the list of available functions when you add/import a custom jar in your pipeline.

Adding Visualizations

Visualizations are graphical representation of the streaming data in a pipeline. You can add visualizations on all stages in the pipeline except a target stage.

Select an appropriate visualization that suits your requirement.

Creating Visualization - Area Visualization

Area visualization represents data as a filled-in area. Area visualization requires at least two groups of data along an axis. The X-axis is a single consecutive dimension, such as a date-time field, and the data lines are unlikely to cross. Y axis represents the metrics (measured value). X axis can also have non date-time categories. This visualization is mainly suitable for presenting accumulative value changes over time.

To add an area visualization:

Open a pipeline in the Pipeline Editor.
Select the required stage and click the Visualizations tab.
Click Add a Visualization and then click Area Chart.
Enter/select values for the following fields:
- Name: a suitable name for the visualization. This is a mandatory field.
- Description: a suitable description. This is an optional field.
- Tags: suitable tags to for easy identification. This is an optional field.
- Y Axis Field Selection: the column to be used as the Y axis. This is a mandatory field.
- Axis Label: a label for the Y axis. This is an optional field.
- X Axis Field Selection: the column to be used as the X axis. This is a mandatory field.
- Axis Label: a label for the X axis. This is an optional field.
- Orientation: select this check box if you want the visualization to appear with a horizontal orientation in the Pipeline Editor. This is optional and you can decide based on your usecase or requirement if you want to change the orientation.
- Data Series Selection: the column to be used as the data series. This is an optional field.
Click Create.