Enhance Visualizations with Statistical Analytics
Statistical analytics enable you to highlight clusters or outliers, add forecasts, and show trend and reference lines in your workbooks.
Before You Start with Statistical Analytics
To add statistical analytics to your workbooks such as forecasts, outliers, and trend lines, you can either use ready-to-use analytics on the Analytics pane of the Data Panel, or use functions in expression builder if you need more control over the configuration.
Oracle
Analytics enables you to add a range of statistical analytics from the Analytics
pane of the Data Panel, which come fully configured so that you don't need to be a
statistical expert to achieve results.
Description of the illustration stat_analytics-png.png
If you need more control over statistical settings, or you want
to use the analytic in other visualizations, consider adding a calculation and use
the expression builder to define the equivalent function. (From the Data pane, click
Add (+), then Create
Calculation to display the expression builder.) For example, you
might use the FORECAST() function.
Description of the illustration stat-analytics4-png.png
See Create a Calcuated Data Element.
You can also access the statistical analytics options by right-clicking
on a visualization, and selecting Add Statistics.
Description of the illustration stat-analytics2-png.png
What Statistical Analytics Can I Add to Visualizations?
Add these statistical analytics to your visualizations to achieve better insights into your data.
Description of the illustration stat_analytics-png.png
Forecast
The forecast function uses linear regression to predict future values based on existing values along a linear trend.
You can set a number of time periods in the future for which you want to predict the value, based on your existing time series data. See Add a Forecast to a Visualization.
Oracle supports these forecast model types:
- Auto-Regressive Integrated Moving Average (ARIMA) - Use if your past time series data is nonseasonal but provides enough observations (at least 50, but preferably more than 100 observations) to explain and project the future.
- Seasonal ARIMA - Use if your data has a regular pattern of changes that repeat over time periods. For example, seasonality in monthly data might be when high values occur during summer months and low values occur during winter months.
- Exponential Triple Smoothing (ETS) - Use to analyze repetitive time series data that doesn't have a clear pattern. This model type produces an exponential moving average that takes into account the tendency of data to repeat itself in intervals over time.
Alternatively, create a custom calculation using the FORECAST
function to have more control over settings, or if you want to use the forecast in other visualizations. See Analytics Functions.
Clusters
The cluster function groups a set of objects in such a way that objects in the same group show more coherence and proximity to each other than to objects in other groups. For example, you can use colors in a scatter chart to show clusters of different groups. See Create a Cluster or Outlier in a Visualization.
- K-means clustering - Use to partition "n" observations into "k" clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
- Hierarchical clustering - Use to create a hierarchy of clusters built using either an agglomerative (bottom-up) approach, or a divisive (top-down) approach.
Alternatively, create a custom calculation using the CLUSTER
function to have more control over settings, or if you want to use the cluster in other visualizations. See Analytics Functions.
Outliers
The outliers function displays data records that are located the furthest away from the average expectation of individual values. For example, extreme values that deviate the most from other observations fall into this category. Outliers can indicate variability in measurement, experimental errors, or a novelty. If you add outliers to a chart that already has clusters, then the outliers are depicted as different shapes.
Outliers can use K-means clustering or hierarchical clustering. See Create a Cluster or Outlier in a Visualization.
Alternatively, create a custom calculation using the OUTLIER
function to have more control over settings, or if you want to use the outlier in other visualizations. See Analytics Functions.
Reference Lines
The reference lines function defines horizontal or vertical lines in a chart that correspond to the X-axis or Y-axis values. See Add a Reference Line to a Visualization.
- Line - You can choose to compute the line between average, minimum, or maximum. For example, in the airline industry, if passenger turnout is plotted against time, the reference line can show whether passenger turnout for a particular month is above or below average.
- Band - A band represents upper and lower range of data points. You can choose a custom option or a standard deviation function, and between average, maximum, and minimum. For example, if you're analyzing sales by month and you use a custom reference band from average to maximum, you can identify months where sales are above average, but below the maximum.
Trend Lines
The trend line function indicates the general course of the metric in question. A trend line is a straight line connecting a number of points on a graph. A trend line helps you analyze the specific direction of a group of value sets in a visualization. See Add Statistical Analytics to Visualizations.
- Linear - Use with linear data. Your data is linear if the pattern in its data points resembles a line. A linear trend line shows that your metric is increasing or decreasing at a steady rate.
- Polynomial - Use this curved line when data fluctuates. It's useful, for example, for analyzing gains and losses over a large dataset.
- Exponential - Use this curved line when data values rise or fall at increasingly higher rates. You can't create an exponential trend line if your data contains zero or negative values.
Alternatively, create a custom calculation using the TRENDLINE
function to have more control over settings, or if you want to use the trend line in other visualizations. See Analytics Functions.
Add Statistical Analytics to Visualizations
Statistical analytics enable you to highlight clusters or outliers, add forecasts, and show trend and reference lines in your workbooks. Select them on the Analytics tab of the Data pane in the workbook editor.
Alternatively, you can add forecasts, trendlines, and clusters to a workbook using text-only analytics functions. See Analytics Functions.
Before you can use analytic functions in visualizations, you must do the following:
-
Install DVML.
On Windows go to Start, browse to and expand your system's Oracle folder, and click Install DVML.
On Mac, go to Applications and click Oracle Analytics Desktop Configure Python.
-
Create a workbook or visualization that you can apply one or more analytic functions to.
Add a Forecast to a Visualization
Add forecasts to your workbooks based on Auto-Regressive Integrated Moving Average (ARIMA), Seasonal ARIMA, or Exponential Triple Smoothing (ETS). For example, you might want to forecast summer temperatures based on data from previous summers.
Before you can use analytic functions in visualizations, you must do the following:
-
Install DVML.
On Windows go to Start, browse to and expand your system's Oracle folder, and click Install DVML.
On Mac, go to Applications and click Oracle Analytics Desktop Configure Python.
-
Create a workbook or visualization that you can apply one or more analytic functions to.
Add a Reference Line to a Visualization
Reference lines enable you to identify averages, medians, percentiles, and similar information in a visualization.
You can use measure, attribute, date, and derived date columns to create reference lines and bands.
Derived dates are columns with different levels of granularity such as Year, Quarter, Month, and Day. Oracle Analytics automatically generates derived date columns for any Date, Time, or Timestamp columns in datasets.
You can bind a parameter to a reference line value or a reference band range in a visualization when you want to use a parameter value to place the reference line or band on the visualization. See Bind a Parameter to a Reference Line or Band.
When you configure the reference line in the Analytics pane on the Properties pane in the Grammar Panel, you might for example, select the Type option to display a line or a band, use the Function option to change the default line to Average, Percentile, Top N, or use the Z Order option for date and date order columns to position the reference line in front or behind a visualization. If you select a non-date attribute column, for example City, you can choose a Value, for example Chicago, on which to display the reference line.
- On the Home page, hover over a workbook, click Actions, then select Open.
- In the Data pane, click the
Analytics icon
.
- Click Add Statistics
, and select Reference Line.
- Use Column to select a measure, date, or non-date attribute.
- In the Analytics pane select properties to update.
- Click Save.