Management Agent for Kubernetes (OCMA) in Failed State After Upgrade Failure

If the Docker image URL specified for the management agent is incorrect or inaccessible during a Helm upgrade of the oci-kubernetes-monitoring chart, the management agent pod remains in a failed state.

In this state, performing a subsequent Helm upgrade with the correct image version does not recover the pod automatically, as Kubernetes does not automatically restart pods that remain in a Failed state after an image pull error.

To resolve this issue:

  1. Upgrade the Helm release with the correct, accessible image URL:
    helm upgrade <release-name> --values <path-to-override-values.yaml> <path-to-helm-chart>
  2. Delete the failed pod so Kubernetes can recreate it with the correct image version:
    kubectl delete pod oci-onm-mgmt-agent-0 -n oci-onm

After deletion, Kubernetes will automatically recreate the pod using the corrected configuration, and the pod should start successfully.