Implementing Automatic Rollback
Use scripts to implement automatic rollback when an apply job fails in Resource Manager. Automatic rollback scripts involve monitoring for job failures and defining a rollback procedure that includes validation, custom triggers, and actions.
In production environments, establishing a robust and flexible deployment strategy is essential. A common best practice is to integrate OCI Resource Manager with a Continuous Integration/Continuous Delivery (CI/CD) system to manage the full deployment lifecycle—including automatic rollback.
Oracle Cloud Infrastructure (OCI) offers a native, fully-featured CI/CD platform: OCI DevOps. This service provides the necessary tools and pipelines to seamlessly orchestrate deployment, testing, monitoring, and rollback operations.
Automatic Rollback Script
To help you get started, we've provided the following sample Bash script, which can be used in an OCI DevOps deployment pipeline. This script demonstrates a mechanism for automatically rolling back an OCI Resource Manager stack in the event of a deployment failure.
The script serves as a starting point that you can customize to suit your specific requirements, ensuring it aligns with the unique needs of your deployment workflows.
version: 0.1
component: command
timeoutInSeconds: 600
shell: bash
steps:
- type: Command
name: "Deployment Runner Functionality"
command: |
# The STACK_OCID variable is externally injected and remains unescaped.
STACK_OCID="${STACK_OCID}"
echo "Starting Resource Manager Apply Job for Stack: ${STACK_OCID}"
# 1. Create the Apply Job and capture its OCID.
job_id=$(
oci resource-manager job create-apply-job \
--execution-plan-strategy AUTO_APPROVED \
--display-name "DevOps-Apply-$(date +%s)" \
--stack-id "${STACK_OCID}" \
--wait-for-state 'ACCEPTED' \
--query 'data.id' \
--raw-output \
)
# Check if the job creation command returned a valid ID
if [ -z "$job_id" ]; then
echo "ERROR: Failed to create Resource Manager Apply job or command failed."
exit 1
fi
echo "Resource Manager Job OCID: $job_id"
# 2. Polling Loop to monitor job status
max_poll_time_seconds=600 # 10 minutes maximum wait time
poll_interval_seconds=10
elapsed_time=0
while [ $elapsed_time -le $max_poll_time_seconds ]; do
# Get the current job status.
status=$(
oci resource-manager job get \
--job-id "$job_id" \
--query 'data."lifecycle-state"' \
--raw-output \
)
echo "Time Elapsed: ${elapsed_time}s / ${max_poll_time_seconds}s - Current Job Status: $status"
if [ "$status" == "SUCCEEDED" ]; then
echo "Resource Manager Apply Job SUCCEEDED."
exit 0 # Success exit code
elif [ "$status" == "FAILED" ] || [ "$status" == "CANCELED" ]; then
echo "Resource Manager Apply Job FAILED or CANCELED."
# Check failure reason and conditionally trigger rollback
if [ "$status" == "FAILED" ]; then
# Fetch failure code
failure_code=$(
oci resource-manager job get \
--job-id "$job_id" \
--query 'data."failure-details".code' \
--raw-output \
)
if [ -z "$failure_code" ]; then
echo "No failure-details.code found in job output (field is empty or missing)"
echo "==== Full job JSON output for debugging: ===="
oci resource-manager job get \
--job-id "$job_id" \
else
echo "Failure code: $failure_code"
fi
if [ "$failure_code" == "TERRAFORM_EXECUTION_ERROR" ]; then
echo "Detected Terraform configuration error. Starting automatic rollback process..."
# Define a rollback function
automatic_rollback() {
# List all jobs, filter for successful Apply jobs, sort by time-created desc, and get latest job OCID
# Part 1: Fetch and count all succeeded APPLY jobs
matched_jobs=$(
oci resource-manager job list \
--stack-id "$STACK_OCID" \
--query 'data[?operation==`APPLY` && "lifecycle-state"==`SUCCEEDED`] | sort_by(@, &`"time-created"`)' \
)
job_count=$(echo "$matched_jobs" | jq 'length')
echo "Number of matching jobs found: $job_count"
# Part 2: Extract and print first OCID if any
last_succeeded_apply_job_ocid=$(echo "$matched_jobs" | jq -r '.[0].id // empty')
if [ -n "$last_succeeded_apply_job_ocid" ]; then
echo "Last succeeded apply job OCID: $last_succeeded_apply_job_ocid"
else
echo "No previous successful apply jobs found. Rollback skipped."
return 0
fi
echo "Invoking OCI CLI to create rollback job for OCID: $last_succeeded_apply_job_ocid"
oci resource-manager job create \
--from-json "{\"stackId\":\"${STACK_OCID}\",\"displayName\":\"DevOps-Apply-Rollback-$(date +%s)\",\"jobOperationDetails\":{\"operation\":\"APPLY_ROLLBACK\",\"executionPlanRollbackStrategy\":\"AUTO_APPROVED\",\"targetRollbackJobId\":\"$last_succeeded_apply_job_ocid\"}}" \
echo "======Apply Rollback Job Creation Complete========="
}
# Call the rollback function
automatic_rollback
fi
fi
exit 1 # Failure exit code
fi
# Wait before polling again
sleep $poll_interval_seconds
let "elapsed_time = elapsed_time + poll_interval_seconds"
done
# If the loop finished without SUCCEEDED or FAILED status
echo "Error: Resource Manager Apply Job timed out after ${max_poll_time_seconds} seconds."
exit 1The automatic rollback logic implemented in the script includes the following steps:
- Initiate Deployment: Create an apply job to deploy the updated Terraform configuration.
- Monitor Deployment Status: Continuously monitor the status of the apply job: See Getting a Job's Details.
- Evaluate Failure: If the apply job fails, evaluate predefined criteria to determine whether to trigger an automatic rollback.
- Identify Stable State: Retrieve the successful apply jobs for the stack and determine the target job that you want to roll back to: See Listing Jobs.
- Trigger Rollback: Create an apply rollback job to restore the stack to the previously stable state.