5.8 Troubleshooting the Installation

This section provides troubleshooting tips for installing OAA, OARM, and OUA.

5.8.1 Problems Running installManagementContainer.sh

This section provides troubleshooting tips for problems while running installManagementContainer.sh.

Podman issues during OAA Management Container installation

  • Podman fails to load the OAA images in the tar file due to image or file format errors. For example:
    Storing signatures
    Getting image source signatures
    Copying blob 01092b6ac97d skipped: already exists
    Copying blob dba9a6800748 skipped: already exists
    Copying blob bae273a35c58 skipped: already exists
    Copying blob 7f4b55b885b0 skipped: already exists
    Copying blob 93e8a0807a49 skipped: already exists
    Copying blob fa5885774604 skipped: already exists
    Copying blob 3b8528487f10 skipped: already exists
    Copying blob 3a1c2e3e35f4 [==========================>-----------] 213.8MiB / 298.1MiB
    Copying blob 6d31843e131e [=================================>----] 210.5MiB / 236.5MiB
    Copying blob f35b9630ef38 [===========>--------------------------] 213.8MiB / 672.2MiB
    Copying blob ef894c2768e3 done
    Copying blob 846fd069f886 [==========>---------------------------] 197.7MiB / 672.2MiB
    Copying blob 257c48b76c82 done
    Error: payload does not match any of the supported image formats (oci, oci-archive, dir, docker-archive)
    This may happen because of lack of free space in the root partition of the installation host (podman stores temporary files under /var/tmp), or because the podman version is not 3.3.0 or later. If this error occurs, remove all files under /var/tmp before retrying the installation once the issues have been addressed.
  • Podman fails to load the OAA images in the tar file due to permissions issues. For example:
    Using image release files ./releaseimages.txt and ./nonreleaseimages.txt...
    tee: ./oaainstall-tmp/run.log: Permission denied
    Using install settings from ./installOAA.properties.
    tee: ./oaainstall-tmp/run.log: Permission denied
    Checking kubectl client version...
    WARNING: version difference between client (1.23) and server (1.21) exceeds
    the supported minor version skew of +/-1
    tee: ./oaainstall-tmp/run.log: Permission denied
    kubectl version required major:1 minor:18, version detected major:1 minor:23
    tee: ./oaainstall-tmp/run.log: Permission denied

    This may happen if you extract the zip file as one user and run installManagementContainer.sh as a different user who doesn't have permissions. In this situation remove the $WORKDIR/oaaimages/oaa-install/oaainstall-tmp directory and retry the install with the same user who extracted the zip file.

  • Podman failed to load the OAA images in the previous attempt to install and now it won't pull/tag/push of all required images. In this situation remove the $WORKDIR/oaaimages/oaa-install/oaainstall-tmp directory and retry.

OAA Management chart installation failure

If the OAA management chart installation fails with the following:
Executing 'helm install ...  oaamgmt charts/oaa-mgmt'.
Continue? [Y/N]:
y
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0]): unknown field "volumMounts" in io.k8s.api.core.v1.Container
it is likely that the manifest files for the OAA management chart got corrupted. Copy installOAA.properties, cert.p12, and trust.p12 to a safe location, remove the install directory $WORKDIR/oaaimages/oaa-install, extract the <OAA_Image>.zip and restart the installation.

Installation script times out waiting for OAA Management Container pod to start

If you see the following error:
NAME                                     READY   STATUS              RESTARTS   AGE
oaamgmt-oaa-mgmt-74c9ff789d-wq82h   0/1     ContainerCreating   0          2m3s
Waiting 15 secs for OAA mgmt deployment to run...
Executing 'kubectl get pods oaamgmt-oaa-mgmt-74c9ff789d-wq82h -n oaans'...
NAME                                     READY   STATUS              RESTARTS   AGE
oaamgmt-oaa-mgmt-74c9ff789d-wq82h   0/1     ContainerCreating   0          2m18s
Waiting 15 secs for OAA mgmt deployment to run...
...
OAA mgmt pod is not running after 450 secs, cannot proceed with install.
Critical error, exiting. Check ./oaainstall-tmp/run.log for additional information.
then run the following commands to get additional information:
$ kubectl get pods -n oaans
$ kubectl describe pod oaamgmt-<pod> -n oaans
  • In case of NFS errors, verify that the NFS volume information in installOAA.properties is correct. In this situation kubectl describe will show the following:
    Output: mount.nfs: mounting <ipaddress>:/scratch/oaa/scripts-creds failed, reason given by server: No such file or directory
      Warning  FailedMount  15s  kubelet, <ipaddress>  Unable to attach or mount volumes: unmounted volumes=[oaamgmt-oaa-mgmt-configpv oaamgmt-oaa-mgmt-credpv oaamgmt-oaa-mgmt-logpv], unattached volumes=[oaamgmt-oaa-mgmt-configpv oaamgmt-oaa-mgmt-credpv oaamgmt-oaa-mgmt-logpv oaamgmt-oaa-mgmt-vaultpv default-token-rsh62]: timed out waiting for the condition
  • In case of image pull errors verify that the image pull secret (dockersecret) was created correctly, and that the properties install.global.repo, install.global.image.tag, and install.global.imagePullSecrets\[0\].name in installOAA.properties are correct. In this situation kubectl describe pod will show the following:
    Warning  Failed     21s (x3 over 61s)  kubelet, <ipaddress>  Error: ErrImagePull
    Normal   BackOff    7s (x3 over 60s)   kubelet, <ipaddress>  Back-off pulling image "container-registry.example.com/oracle/shared/oaa-mgmt:<tag>"
    Warning  Failed     7s (x3 over 60s)   kubelet, <ipaddress>  Error: ImagePullBackOff
  • In case of timeouts with no apparent error it may be possible that the cluster took too long to download the OAA management image. In this case the management pod will eventually start but the installation will abort. If this happens, delete the OAA management helm release using helm delete oaamgmt -n oaans and rerun the installation script.

5.8.2 Problems Running OAA.sh

This section provides troubleshooting tips for problems while running OAA.sh.

General failures during OAA.sh

If the OAA.sh deployment fails at any stage, you can generally fix the underlying issue and then rerun OAA.sh. Any configuration tasks already performed by the installation will be skipped when OAA.sh is rerun.

Log information

If the deployment fails while running OAA.sh and you need more detailed information, you can view the install.log. The log is accessible from within the management container at /u01/oracle/logs, or outside the container at <NFS_LOGS_PATH>.

If the install.log references another component log file, this log can also be found in the same location.

Trust and Cert Store Configuration failed

If you receive the following message:
Configuring Trust and Cert Store for OAA.
Trust and Cert Store Configuration failed. Check log /u01/oracle/logs/install.log for details.
and the install.log shows:
Configuring Trust and Cert Store for OAA.
Checking oauth.identityuri mentioned in /u01/oracle/scripts/settings/installOAA.properties
 Property oauth.identityuri is https, will proceed to download certificate for https://ohs.example.com
 Checking url connectivity for https://ohs.example.com - 139784391608128:error:02002071:system library:connect:No route to host:crypto/bio/b
_sock2.c:110: 139784391608128:error:2008A067:BIO routines:BIO_connect:connect error:crypto/bio/b_sock2.c:111: connect:errno=113
Failed
This means that either:
  • The URL being accessed, for example https://ohs.oracle.com, is not accessible; or
  • The OHS (or load balancer) being accessed is using the default 443 port, and the oauth.identityuri parameter in installOAA.properties does not have :443 appended. See, Editing the installOAA.properties.

Resolve the problem by editing the <NFS_CONFIG>/installOAA.properties and run OAA.sh again.

OAuth creation fails during OAA.sh

During the installation, the OAuth domain, client, and resource server are created. If they fail, check if the parameters for OAuth are correct. See Configuring OAuth and Oracle HTTP Server.

OAuth check fails during OAA.sh

This occurs if the httpd.conf and mod_wl_ohs.conf files are not updated. To update the values, see Configuring OAuth and Oracle HTTP Server.

Configuring OAM for OAA. OAM for OAA setup failed

If you receive the following message:
Configuring OAM for OAA. OAM for OAA setup failed. Check log /u01/oracle/logs/install.log for details.
The install.log may refer you to the /u01/oracle/logs/add_resources.log. In the add_resources.log you may see:
curl -sk --connect-timeout 30  -X POST 'https://ohs.example.com/oam/services/rest/11.1.2.0.0/ssa/policyadmin/resource'  -H 'Content-Type: application/json' -H 'Authorization: Basic <Base64EncodedUser:Password>' -d '{"queryString":null,"applicationDomainName":"IAM Suite","hostIdentifierName":"IAMSuiteAgent","resourceURL":"/oauth2/rest/**","protectionLevel":"EXCLUDED","QueryParameters":null,"resourceTypeName":"HTTP","Operations":null,"description":"/oauth2/rest/**","name":"/oauth2/rest/**","id":"1"}'
If no error is referenced, then run the command from inside the management container. You will need to substitute <Base64EncodedUser:Password> with the value set for the parameter oauth.basicauthzheader in the installOAA.properties. If you receive the error Error 401--Unauthorized, then this means the value set for oauth.basicauthzheader is incorrect. Edit the <NFS_CONFIG>/installOAA.properties and set the parameter to the correct value. Then rerun OAA.sh. See, Editing the installOAA.properties for more details.

Problems starting pods

During OAA.sh, when it gets to the Installing OAA section displayed in the output, you can check the status of the pods by running:
kubectl get pods -n oaans
If any of the pods fail or do not start, you can run the following commands to get more details:
kubectl logs -f <pod> -n oaans
kubectl describe pod <pod> -n oaans

During OAA.sh pods fail to start and show CrashLoopBackOff

Run the kubectl logs <pod> -n <namespace> command against the pods showing the error. The following may be one of the reasons for the error:

Pods were not able to connect to https://ohs.example.com/.well-known/openid-configuration because the PathTrim and PathPrepend in the mod_wl_ohs.conf for that entry were not updated. See Configuring OAuth and Oracle HTTP Server.

OAA.sh installation timed out but pods show as running

If the OAA installation timed out but the OAA pods show no errors and eventually end up in running state, it is possible that the cluster took too long to download the OAA images. In this case the OAA pods will eventually start but the installation will not complete. If this happens, clean up the installation and rerun the installation script.

Kubectl reports "Unable to connect to the server: net/http: TLS handshake timeout"

Possible causes are:
  • Proxies are defined in the environment and the no_proxy" environment variable does not include the cluster nodes. To resolve the issue the cluster node IPs or hostnames must be added to the no_proxy environment variable.
  • The kube config file ~/.kube/config or /etc/kubernetes/admin.conf is not valid.

Failed to import snapshot

If you receive the following message during installation:
Importing the snapshot file : /u01/oracle/scripts/oarm-12.2.1.4.1-base-snapshot.zip
Executing CURL : curl --silent -k --location --request POST ‘https://ohs.example.com/policy/risk/v1/snapshots’
      --header ‘Authorization: Basic b2FhaW5zdGFsbC1vYWEtcG9saWN5OldlbGNvbWUx’      --header ‘Content-type: ap
plication/octet-stream’      --data-binary ‘@/u01/oracle/scripts/oarm-12.2.1.4.1-base-snapshot.zip’
Import status : {“status”:“201",“message”:“Snapshot created successfully.“,”snapshot”:{“name”:“OARM Snapshot”,“description”:“OARM Snapshot”,“snapshotId”:“1",“createTime”:“10-10-2024 16:32:45"}}
Upload status : 201
Snapshot ID : 1
Applying snapshot : curl --silent -k --location --request POST ‘https://ohs.example.com/policy/risk/v1/snapshots/1/apply’      --header ‘Authorization: Basic b2FhaW5zdGFsbC1vYWEtcG9saWN5OldlbGNvbWUx’
Apply result : <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
parse error: Invalid numeric literal at line 1, column 7
Fail to apply the snapshot: 1
Failed to import snapshot.
Then it is possible the ingress controller is timing out before it has had a chance to complete the snapshot import. One possible solution to this is update the ingress controller as follows:
kubectl annotate ingress -n oaans nginx.ingress.kubernetes.io/proxy-read-timeout=3600
      nginx.ingress.kubernetes.io/proxy-connect-timeout=3600
      nginx.ingress.kubernetes.io/proxy-send-timeout=3600 --all

Error 'jq: error: Invalid escape at line 1, column 6` Creating tap partner in OAA

If you see the following error running OAA.sh then the oua.tapAgentFilePass value was not was not set in base64:
jq: error: Invalid escape at line 1, column 6 (while parsing '"\�"') at <top-level>, line 1:
.agentName |= if . == "" then "MFAOAAPartner17ohsapr9" else . end |             .privateKey |= if . == "" then "CECECECE0000000200000001..etc..  
jq: 1 compile error
Creating tap partner in OAA

To solve this problem, set the value to the base64 version of the password and run the OAA.sh again. See, Preparing the Properties file for Installation.

Bad Oracle Access Manager Request in DRSS Logs

If you see the following error in the DRSS pod logs:
<DATE> Thread[http-thread-34,5,server]: INFO oracle.security.am.drss.handler.oam.OAMHandler parseOAMResponse Exception during parseOAMResponse Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
 at [Source: (String)"<html><head><title>Bad Oracle Access Manager Request</title><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"></head><body><h1>Bad Oracle Access Manager Request</h1><p>Unable to process the request due to unexpected error.</p></body></html>
Then the oua.oamRuntimeEndpoint was either set incorrectly in the installOAA.properties, not set to the fully qualified hostname of the OAM server, or the OAM server is not functioning correctly.

Unable to delete the OAA domain from OAuth during cleanup

List all clients and resources within the domain and delete each one of them before deleting the domain:
  1. Encode the OAM administrator user and its password by using the command:
    echo -n <username>:<password> | base64
    For example:
    echo -n oamadmin:<password> | base64
    This value should be used for <ENCODED_OAMADMIN> in the example below.
  2. Run the following:
    $ curl --location --request DELETE 'http://<OAuth_Host>:<OAuth_port>/oam/services/rest/ssa/api/v1/oauthpolicyadmin/oauthidentitydomain?name=OAADomain' \
    --header 'Authorization: Basic <ENCODED_OAMADMIN>'
    OAuth Identity Domain is not empty. Kindly remove (resource/client) entities from identity domain
    $ curl --location --request GET 'http://<OAuth_Host>:<OAuth_port>/oam/services/rest/ssa/api/v1/oauthpolicyadmin/client?identityDomainName=OAADomain' --header 'Content-Type: application/json' --header 'Authorization: Basic <ENCODED_OAMADMIN>'
    $ curl --location --request GET 'http://<OAuth_Host>:<OAuth_port>/oam/services/rest/ssa/api/v1/oauthpolicyadmin/application?identityDomainName=OAADomain' --header 'Content-Type: application/json' --header 'Authorization: Basic <ENCODED_OAMADMIN>'