Batch Metrics Reference

Find out about the metrics emitted by the Batch service.

You can monitor the health, capacity, and performance of Batch resources using metrics , alarms , and notifications.

This topic describes the metrics emitted by Batch in the oci_batch metric namespace.

Batch Metrics Overview

Batch metrics help you monitor batch workloads at the batch context, batch job pool, and batch job levels. You can use these metrics for diagnosing and troubleshooting usage, health, and performance issues across resources, including jobs, tasks, cores, RAM, and custom resource (entitlements) usage.

By default, service metrics are typically posted every 60 seconds (at least one data point per minute).

To view batch metrics in the Console, select the batch resource, then use the Monitoring tab. You can also use the Monitoring service to create custom metric queries. See Building Metric Queries.

Prerequisites

IAM policies: To monitor resources, you must be granted the required type of access in a policy  written by an administrator, whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to both the monitoring services and the resources being monitored. If you try to perform an action and get a message that you don't have permission or are unauthorized, contact the administrator to find out what type of access you were granted and which compartment  you need to work in. For more information about user authorizations for monitoring, see IAM Policies.

Available Metrics: oci_batch

The metrics listed in the following tables are automatically available for any batch resources you create. You don't need to enable monitoring on the resource to get these metrics.

Batch metrics include the following dimensions:

RESOURCEID
The OCID  of the resource to which the metric applies.
RESOURCEDISPLAYNAME
The name of the resource to which the metric applies.
JOBPOOLID
The OCID of the associated job pool.
JOBPOOLDISPLAYNAME
The display name of the job pool.
LIFECYCLESTATE
State of the job or task (for example, ACCEPTED, IN PROGRESS, WAITING)
FLEETNAME
The name of the fleet configuration (for fleet-related metrics).
ENTITLEMENTNAME
The name of customer defined entitlements.
Metric Metric Display Name Unit Description Dimensions
JobCount Jobs Count count Number of jobs in each lifecycle state in BatchContext or JobPool. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName, lifecycleState
JobResult Job Result Boolean For each succeeded, canceled, or failed job, emits 0/1 per job. Used to track job completion and failure rates. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName, lifecycleState
TaskCount Task Count count Number of tasks in each lifecycle state across BatchContext and JobPool. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName, lifecycleState
TaskOnFleetCount Tasks on Fleets count Number of tasks in applicable states per assigned fleet. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName, fleetName
EntitlementsUsage Entitlements count Number of customer-defined entitlements in use. resourceId, resourceDisplayName, entitlementsName
CoreUsage Cores OCPUs Number of OCPUs currently in use. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName
MemUsageinGB Memory Usage GB Amount of RAM used by farm or job pool. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName
DiskSizeinGB Disk Size GB Total disk size currently in use. resourceId, resourceDisplayName, jobPoolId, jobPoolDisplayName