11 Monitoring BRM Cloud Native Services
Learn how to monitor your Oracle Communications Billing and Revenue Management (BRM) cloud native services by using Prometheus and Grafana.
About Monitoring BRM Cloud Native Services
You can set up monitoring for the following BRM cloud native services:
-
CM
-
Oracle DM
-
Oracle DM shared memory, front-end processes, and back-end processes
-
BRM Java Applications: RE Loader Daemon, Batch Controller, and EAI Java Server (JS)
-
Web Services Manager
-
BRM database
The metrics for the database are generated by OracleDB_exporter, and the metrics for all other BRM services are generated directly by BRM cloud native. You use Prometheus to scrape and store the metric data and then use Grafana to display the data in a graphical dashboard.
Setting Up Monitoring for BRM Cloud Native Services
To set up monitoring for BRM cloud native services:
-
Deploy Prometheus in your Kubernetes Cluster in one of the following ways:
-
Deploy a standalone version of Prometheus in your cloud native environment. See "Installation" in the Prometheus documentation.
-
Deploy Prometheus Operator. See "prometheus-operator" on the GitHub website.
For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
-
-
Install Grafana. See "Install Grafana" in the Grafana documentation.
For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
-
Configure BRM cloud native to collect metrics for its components and export them to Prometheus. See "Configuring BRM Cloud Native to Collect Metrics".
-
Configure how Perflib generates metric data for BRM opcodes. See "Configuring Perflib for BRM Opcode Monitoring".
-
Configure OracleDB_exporter to scrape metrics from your Oracle database and export them to Prometheus. See "Configuring OracleDB_Exporter to Scrape Database Metrics".
-
Create Grafana Dashboards to view your metric data. See "Configuring Grafana for BRM Cloud Native".
Configuring BRM Cloud Native to Collect Metrics
To configure BRM cloud native to collect metrics for its components and then expose them in Prometheus format:
-
In your override-values.yaml file for oc-cn-helm-chart, set the monitoring.prometheus.operator.enable key to one of the following:
-
true if you are using Prometheus Operator.
-
false if you are using a standalone version of Prometheus. This is the default.
-
-
To collect metrics for the CM, do the following:
-
In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.cm.deployment.perflib_enabled key to true.
-
In the oms-cm-perflib-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
-
In the oms-cm-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
-
-
To collect metrics for Oracle DM shared memory, front-end processes, and back-end processes, do the following:
In the oms-cm-perflib-config ConfigMap, set the data.ENABLE_PROCESS_METRICS key to true.
-
To collect metrics for the dm-oracle pod, do the following:
-
In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.dm_oracle.deployment.perflib_enabled key to true.
-
In the oms-dm-oracle-perflib-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
-
In the oms-dm-oracle-config ConfigMap, review and update the Perflib configuration. For information about the possible values, see "Configuring Perflib for BRM Opcode Monitoring".
-
-
To collect metrics for the BRM Java applications, REL Daemon, Batch Controller, and EAI Java Server, do the following:
In your override-values.yaml file for oc-cn-helm-chart, set the monitoring.prometheus.jmx_exporter.enable key to true.
-
To collect metrics for Web Services Manager, do the following:
In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.wsm.deployment.monitoring.isEnabled key to true.
-
To persist the Perflib timing files in your BRM database, do the following:
-
In your override-values.yaml file for oc-cn-helm-chart, set the ocbrm.perflib.deployment.persistPerlibLogs key to true.
-
Check the values of these Perflib timing-related environment variables in your oms-cm-perflib-config and oms-dm-oracle-perflib-config ConfigMaps: PERFLIB_VAR_TIME, PERFLIB_VAR_FLIST, and PERFLIB_VAR_ALARM. See Table 11-1 for more information.
-
-
Run the helm upgrade command to update the BRM Helm release:
helm upgrade BrmReleaseName oc-cn-helm-chart --values OverrideValuesFile -n BrmNameSpace
where:
-
BrmReleaseName is the release name for oc-cn-helm-chart and is used to track this installation instance.
-
OverrideValuesFile is the file name and path to your override-values.yaml file.
-
BrmNameSpace is the namespace in which to create BRM Kubernetes objects for the BRM Helm chart.
-
After you update the Helm release, metrics will be exposed to Prometheus through the CM pod at the /metrics endpoint with the following ports:
-
CM: Port 11961
-
Oracle DM shared memory, back-end processes, and front-end processes: Port 11961 or Port 31961
-
Oracle DM: Port 12951
Example: Enabling Monitoring for All BRM Components
This shows sample override-values.yaml entries for enabling the collection of the following metrics for Prometheus:
-
CM
-
Oracle DM
-
Oracle DM shared memory, front-end processes, and back-end processes
-
Web Services Manager
-
BRM Java applications: REL Daemon, Batch Controller, and EAI Java Server
It also configures BRM to persist the Perflib timing files in your BRM database.
monitoring:
prometheus:
operator:
enable: false
jmx_exporter:
enable: true
ocbrm:
cm:
deployment:
perflib_enabled: true
dm_oracle:
deployment:
perflib_enabled: true
perflib:
deployment:
persistPerflibLogs: true
wsm:
deployment:
monitoring:
isEnabled: true
Configuring Perflib for BRM Opcode Monitoring
The BRM cloud native deployment package includes the BRM Performance Profiling Toolkit (Perflib), which the Connection Manager (CM), Oracle Data Manager (DM), Synchronization Queue DM, and Account Synchronization DM depend on for generating and exposing BRM opcode metrics.
You configure how Perflib generates the metric data by setting environment variables in the following:
-
For the CM: The oms-cm-perflib-config ConfigMap
-
For the DMs: The oms-dm-oracle-perflib-config ConfigMap
Table 11-1 describes the environment variables you can use to configure Perflib for the CM and DMs.
Table 11-1 Perflib Environment Variables
Environment Variable | Description |
---|---|
PERFLIB_ENABLED |
Whether to enable opcode monitoring with Perflib.
|
PERFLIB_HOME |
The location of the Perflib Toolkit. |
PERFLIB_DEBUG |
The debug log level for Perflib.
|
PERFLIB_MAX_LOG_SIZE |
The maximum number of opcodes that can be logged in one log file. You can use this to prevent huge log files if detailed tracing is used for long periods.
|
PERFLIB_AGGREGATION_PERIOD |
The amount of time that data is recorded into a bucket, in minutes or hours. When the amount of time expires, Perflib creates a new bucket. For example, each bucket could record an hour's worth of data, 2 hours of data, or 5 minutes of data. The allowed values for hours: 1h, 2h, 3h, 4h, 6h, 8h, 12h, or 24h. The allowed values for minutes: 1m, 2m, 3m, 4m, 5m, 6m, 10m, 12m, 15m, 30m, or 60m. The default is 1h. |
PERFLIB_FLUSH_FREQUENCY |
How frequently, in seconds, to flush in-memory aggregation data to trace files on disk. The default is 3600 (1 hour). |
PERFLIB_LOG_SINGLE_FILE |
The prefix for tracing filenames, such as cm_batch, cm_aia, or cm_rt. This allows you to distinguish the trace files for each type of application. The default is perf_log. |
PERFLIB_PIN_SHLIB |
The full path of the shared library that contains the BRM opcode functions being interposed. This environment variable is used for the CM only. The default is /oms/lib/libcmpin.so. |
PERFLIB_DATA_FILE |
The full path name of the memory-mapped data file Perflib uses to store control variables and real-time trace data. The following special formatting characters can be used as part of the data file name and are substituted by Perflib when the data file is created:
The default is /oms_logs/perflib_data.dat. |
PERFLIB_LOG_DIR |
The directory where trace output is written. The default is /oms_logs. |
PERFLIB_DATA_FILE_RESET |
Whether real-time tracing data and variable settings are maintained between application executions. This enables statistics to continue to accumulate across an application restart.
|
PERFLIB_VAR_TIME |
Whether the Perflib tracing is activated immediately.
|
PERFLIB_VAR_FLIST |
Whether the Perflib flist tracing is activated immediately.
|
PERFLIB_VAR_ALARM |
Whether the Perflib alarm functionality is activated immediately.
|
PERFLIB_AUTO_FLUSH |
Whether the CM flushes data regularly (with the frequency set by PERFLIB_FLUSH_FREQUENCY).
This environment variable is used for the CM only. |
PERFLIB_COLLECT_CPU_USAGE |
Whether user and system CPU usage is tracked at the opcode level, allowing CPU hogs to be identified more easily.
|
PERFLIB_LOCK_METHOD |
The method used to lock between processes.
|
PERFLIB_ASYNC_FLUSHING |
Whether flushing to the trace file from memory is done within the opcode execution, or asynchronously in a separate thread.
|
PERFLIB_TRACE_OBJECT_TYPE |
Whether Perflib records the BRM object type associated with different database operations, such as PCM_OP_SEARCH, PCM_OP_READ_FLDS, PCM_OP_WRITE_FLDS, and so on. This can help you understand which objects are being read or written most frequently and how much time is being spent on different objects. For PCM_OP_EXEC_SPROC, the latest versions of Perflib will record the name of the stored procedure that was run.
|
PERFLIB_GROUP_TRANSACTIONS |
Whether Perflib tracks BRM transactions as a single unit. The opcodes run as part of a transaction are grouped under a virtual opcode, TRANSACTION_GROUP.
|
PERFLIB_LOG_MAX_SINGLE_FILE_SIZE |
The threshold file size at which a new single log file is created (it only works with the PERFLIB_LOG_SINGLE_FILE parameter). Whenever a flush of aggregate timing data causes the configured size to be exceeded, the log file is renamed, and a new file is created for subsequent data. The size is expressed in bytes. For example, 5242880 is equivalent to 512 Mb. If the parameter is not defined or set to 0, the file size defaults to 1 GB. |
PERFLIB_ALARM_CONFIG_FILE |
How Perflib handles alarms. Perflib provides an example alarm file, alarm_config.lst, which shows how operation-specific configurations may be done. |
PERFLIB_ALARM |
The general alarm that triggers the logging of information regarding any opcode call that exceeds a particular elapsed time. |
ENABLE_PROCESS_METRICS |
Whether Prometheus generates metrics for the Oracle DM shared memory, front-end processes, and back-end processes.
|
PERFLIB_LOG_CORRELATION_IN_CALL_STACK |
Whether Perflib adds the BRM correlation ID to call-stack logs.
|
PERFLIB_FLIST_LOG_TO_STDOUT |
Instructs Perflib to generate flist logs to standard output.
|
Configuring OracleDB_Exporter to Scrape Database Metrics
You use OracleDB_Exporter to scrape metrics from your BRM database and export them to Prometheus. Prometheus can then read the metrics and display them in a graphic format in Grafana.
To configure OracleDB_Exporter to scrape and export metrics from your BRM database:
-
Download and install the following external applications:
-
OracleDB_exporter. See https://github.com/iamseth/oracledb_exporter on the GitHub website.
-
Oracle database client.
For the list of compatible software versions, see "BRM Cloud Native Deployment Software Compatibility" in BRM Compatibility Matrix.
-
-
Specify the BRM database metrics to scrape and export in the Exporter_home/default-metrics.toml file, where Exporter_home is the directory in which you deployed OracleDB_Exporter.
For more information, see https://github.com/iamseth/oracledb_exporter/blob/master/README.md on the GitHub website.
-
Open your override-values.yaml file for Prometheus.
-
Configure Prometheus to fetch performance data from OracleDB_exporter.
To do so, copy and paste the following into your override-values.yaml file, replacing hostname with the host name of the machine on which OracleDB_exporter is deployed:
static_configs: - targets: [hostname:33775'] - job_name: 'oracledbexporter' static_configs: - targets: ['hostname:9161']
-
Save and close your file.
-
Run the helm upgrade command to update your Prometheus Helm chart release.
The metrics for your BRM database are available at http://hostname:9161/metrics.
Configuring Grafana for BRM Cloud Native
You can create a dashboard in Grafana to display the metric data for your BRM cloud native services.
Alternatively, you can use the sample dashboards included in the oc-cn-docker-files-15.0.x.0.0.tgz package. To use the sample dashboards, import the dashboard files from the oc-cn-docker-files/samples/monitoring/ directory into Grafana. See "Export and Import" in the Grafana Dashboards documentation.
Table 11-2 describes each sample dashboard.
Table 11-2 Sample Grafana Dashboards
File Name | Description |
---|---|
oc-cn-applications-dashboard.json |
Provides a high-level view of all BRM components that have been installed, grouped by whether they are running or have failed. |
ocbrm-batch-controller-dashboard.json |
Allows you to view JVM-related metrics for the Batch Controller. |
ocbrm-cm-dashboard.json |
Allows you to view CPU and opcode-level metrics for the CM. |
ocbrm-dm-oracle-dashboard.json |
Allows you to view opcode-level, CPU usage, and memory usage metrics for the Oracle DM. |
ocbrm-dm-oracle-shm-dashboard.json |
Allows you to view shared memory, front-end process, and back-end process metrics for the Oracle DM. |
ocbrm-eai-js-dashboard.json |
Allows you to view JVM and opcode-related metrics for the EAI JS. |
ocbrm-overview-dashboard.json |
Allows you to view metrics for BRM services at the pod, container, network, and input-output level. |
ocbrm-rel-dashboard.json |
Allows you to view JVM-related metrics for Rated Event (RE) Loader. |
ocbrm-rem-dashboard.json |
Allows you to view metrics for Rated Event Manager (REM). |
ocbrm-remtable-dashboard.json |
Allows you to view table metrics for Rated Event Manager (REM). |
ocbrm-wsm-weblogic-server-dashboard.json |
Allows you to view metrics for Web Services Manager. |
Note:
For the sample dashboard to work properly, the data source name for the WebLogic Domain must be Prometheus.
You can also configure Grafana to send alerts to your dashboard, an email address, or Slack when a problem occurs. For example, you could configure Grafana to send an alert when an opcode exceeds a specified number of errors. For information about setting up alerts, see "Grafana Alerts" in the Grafana documentation.
BRM Opcode Metric Group
Use the BRM opcode metric group to retrieve runtime information for BRM opcodes. Table 11-3 lists the metrics in this group.
Table 11-3 BRM Opcode Metrics
Metric Name | Metric Type | Metric Description | Pod |
---|---|---|---|
brm_opcode_calls_total |
Counter |
The total number of calls for a BRM opcode. |
cm dm-oracle |
brm_opcode_errors_total |
Counter |
The total number of errors when executing a BRM opcode. |
cm dm-oracle |
brm_opcode_exec_time_total |
Counter |
The total time taken to run a BRM opcode. |
cm dm-oracle |
brm_opcode_user_cpu_time_total |
Counter |
The total CPU time taken to run the BRM opcode in user space. |
cm dm-oracle |
brm_opcode_system_cpu_time_total |
Counter |
The total CPU time taken to run the BRM opcode in OS Kernel space. |
cm dm-oracle |
brm_opcode_records_total |
Counter |
The total number of records returned by the BRM opcode execution. |
cm dm-oracle |
brm_dmo_shared_memory_used_current |
Gauge |
The total number of shared memory blocks currently used by dm_oracle. |
cm |
brm_dmo_shared_memory_used_max |
Counter |
The maximum number of shared memory blocks currently used by dm_oracle. |
cm |
brm_dmo_shared_memory_free_current |
Gauge |
The total number of free shared memory blocks available to dm_oracle. |
cm |
brm_dmo_shared_memory_hwm |
Gauge |
The shared memory high watermark for dm_oracle. |
cm |
brm_dmo_shared_memory_bigsize_used_max |
Counter |
The maximum big size shared memory used by dm_oracle in bytes. |
cm |
brm_dmo_shared_memory_bigsize_used_current |
Gauge |
The total big size shared memory used by dm_oracle in bytes. |
cm |
brm_dmo_shared_memory_bigsize_hwm |
Gauge |
Big size shared memory high water mark for dm_oracle in bytes. |
cm |
brm_dmo_front_end_connections_total |
Gauge |
The total number of connections for a dm_oracle front-end process. |
cm |
brm_dmo_front_end_max_connections_total |
Counter |
The maximum number of connections for a dm_oracle front-end process. |
cm |
brm_dmo_front_end_trans_done_total |
Counter |
The total number of transactions handled by the dm_oracle front-end process. |
cm |
brm_dmo_front_end_ops_done_total |
Counter |
The total number of operations handled by the dm_oracle front-end process. |
cm |
brm_dmo_back_end_ops_done_total |
Counter |
The total number of operations done by the dm_oracle back-end process. |
cm |
brm_dmo_back_end_ops_error_total |
Counter |
The total number of errors encountered by the dm_oracle back-end process. |
cm |
brm_dmo_back_end_trans_done_total |
Counter |
The total number of transactions handled by the dm_oracle back-end process. |
cm |
brm_dmo_back_end_trans_error_total |
Counter |
The total number of transaction errors in the dm_oracle back-end process. |
cm |
com_portal_js_JSMetrics_CurrentConnectionCount |
Counter |
The current count of concurrent connections to the Java Server from the CM. |
cm (eai-java-server) |
com_portal_js_JSMetrics_MaxConnectionCount |
Counter |
The maximum concurrent connections to the Java Server from the CM. |
cm (eai-java-server) |
com_portal_js_JSMetrics_SuccessfulOpcodeCount |
Counter |
The count of opcodes called from the CM, the execution of which succeeded in the Java Server. |
cm (eai-java-server) |
com_portal_js_JSMetrics_FailedOpcodeCount |
Counter |
The count of opcodes called from the CM, the execution of which failed in the Java Server. |
cm (eai-java-server) |
com_portal_js_JSMetrics_TotalOpcodeCount |
Counter |
The total count of opcodes called from the CM. |
cm (eai-java-server) |
com_portal_js_JSMetrics_TotalOpcodeExecutionTime |
Counter |
The total time taken in milliseconds across all opcodes. |
cm (eai-java-server) |