Troubleshoot Management Agents Service
This section covers some typical issues and resolutions related to the Management Agents service, such as installing, and deinstalling with Management Agents and Management Gateways.
Topics:
-
Troubleshoot Management Agents Installation and Configuration Issues
- Troubleshoot: Please uninstall the agent and remove the service file before installing the new agent!
- Troubleshoot: Java is not a 64-bit JVM! Please set path of a 64-bit JVM in the environment variable JAVA_HOME or Java not found please set your preferred path in JAVA_HOME.
- Troubleshoot: Agent Installation failed with message: useradd: Can't get unique GID (no more available GIDs)
- Troubleshoot: useradd: cannot create directory /usr/share/mgmt_agent
- Troubleshoot: Windows: The system cannot find the path specified. Agent install failed.
- Troubleshoot: Management Agent status is "Not Available" in Console after the initial installation
- Troubleshoot: Agent runs into OutOfMemoryException
- Troubleshoot: OCI Management Agent is not starting on Windows host
- Troubleshoot: Management Agent automatic upgrade is not working or skipped some Agents
- Troubleshoot: IP address being displayed in host column when Management Agent installed on Windows host
- Troubleshoot: Management Agent installation fails on SELinux when using external volume
- Troubleshoot: Management Agent installation fails on Red Hat Enterprise Linux 9.x
- Troubleshoot: Unable To View Prometheus Namespace and Metrics In Monitoring Service
- Troubleshoot: Flag provided but not defined
- Troubleshoot: Adding SSH Credentials Fails With Error: Illegal unquoted character
-
Troubleshoot Management Agents on Compute Instances
- Troubleshoot: Agent is in Not Available state and agent log file reports "Invalid tags"
- Troubleshoot: Management Agent setup failed with fork/exec oracle.polaris.oca.main: permission denied
- Troubleshoot: Management Agent authentication failure due to clock skew, a different time on the compute instance compared to the time on the server
- Troubleshoot: OCI Management Agent Service: Agent Not Visible In OCI Console Under Observability & Management
- Troubleshoot Management Agents Upgrade Issues
-
Troubleshoot Management Gateways
- Troubleshoot: Remove Management Gateway
- Troubleshoot: Configure Management Gateway
- Troubleshoot: Management Gateway installation fails on Red Hat Enterprise Linux 9.x
- Troubleshoot: Management Gateway Installation Fails With Error: Certificates could not be created and the Identity logs report: Authentication failed: DATE_OUTSIDE_CLOCK_SKEW
- Troubleshoot: When installing and configuring Management Agent, timed out error
Troubleshoot Management Agents Installation and Configuration Issues
Users may encounter various errors during Oracle Management Agent installation and configuration process. Causes and recommended actions for some common errors are listed below.
- Troubleshoot: Please uninstall the agent and remove the service file before installing the new agent!
- Troubleshoot: Java is not a 64-bit JVM! Please set path of a 64-bit JVM in the environment variable JAVA_HOME or Java not found please set your preferred path in JAVA_HOME.
- Troubleshoot: Agent Installation failed with message: useradd: Can't get unique GID (no more available GIDs.
- Troubleshoot: useradd: cannot create directory /usr/share/mgmt_agent
- Troubleshoot: Windows: The system cannot find the path specified. Agent install failed.
- Troubleshoot: Management Agent status is "Not Available" in Console after the initial installation
- Troubleshoot: After configuration, the Management Agent is not visible in console or through the API
- Troubleshoot: Prometheus or Kubernetes metrics monitored using Management Agent are not available
- Troubleshoot: Agent runs into OutOfMemoryException
- Troubleshoot: OCI Management Agent is not starting on Windows host
- Troubleshoot: Management Agent automatic upgrade is not working or skipped some Agents
- Troubleshoot: IP address being displayed in host column when Management Agent installed on Windows host
- Troubleshoot: Management Agent installation fails on SELinux when using external volume
- Troubleshoot: Management Agent installation fails on Red Hat Enterprise Linux 9.x
- Troubleshoot: Unable To View Prometheus Namespace and Metrics In Monitoring Service
- Troubleshoot: Flag provided but not defined
- Troubleshoot: Adding SSH Credentials Fails With Error: Illegal unquoted character
Troubleshoot: Please uninstall the agent and remove the service file before installing the new agent!
Cause: There's an agent already installed on your host. A previous deinstall process did not remove the agent service file successfully.
- Run
rpm -e oracle.mgmt_agent
to uninstall the agent. If command succeeds, try installing the new agent. If command doesn't work, try the next recommended action. - Execute
ls /opt/oracle/mgmt_agent
to check if you have residuals of the previous agent installation. If you find it, delete it by running:rm -rf /opt/oracle/mgmt_agent
. - Check if you already have agent service file at the following
location depending on your Linux version:
- For OL7 (if you are using systemd):
/etc/systemd/system/mgmt_agent.service
- For OL6 (if you are using init):
/etc/init/mgmt_agent.conf
.If you find that you have this service file, remove it by running:
rm -rf /etc/init/mgmt_agent.conf
and then retry installing the new agent.
- For OL7 (if you are using systemd):
Troubleshoot: Java is not a 64-bit JVM! Please set path of a 64-bit JVM in the environment variable JAVA_HOME or Java not found please set your preferred path in JAVA_HOME.
Cause: The JAVA_HOME
environment variable is not
set or it's not pointing to a 64 bit JDK location.
Action: Set JAVA_HOME
environment variable to
the right JDK version and retry installing the agent. Currently, only 64 bit JDK is
supported.
Troubleshoot: Agent Installation failed with message: useradd: Can't get unique GID (no more available GIDs)
Cause: The installation script cannot add a user and group during the management agent installation process because the available group ids on your Linux system are already in use.
Executing install
Unpacking software zip
Copying files to destination dir (/opt/oracle/mgmt_agent)
useradd: Can't get unique GID (no more available GIDs)
useradd: can't create group
Agent installation failed, please check log file
Action: Consult with the system administrator before proceeding with the following:
-
Edit the
/etc/login.defs
file. You requiresudo
privileges to edit the file.Look for the following entries:
WhereSYS_GID_MIN nnnn SYS_GID_MAX mmmm SYS_UID_MIN pppp SYS_UID_MAX qqqq
nnnn
andpppp
are the minimum value andmmmm
andqqqq
are the maximum value.If the above entries don't exist in the file, add them.
-
Update the value of
SYS_GID_MAX
entry based on the system administrator's recommendation, and save the file. -
Remove the failed agent installation by running:
sudo rpm -e oracle.mgmt_agent
. -
Logout of the shell followed by login.
-
Retry the agent installation.
Troubleshoot: useradd: cannot create directory /usr/share/mgmt_agent
During the Management Agent installation, the mgmt_agent
user is created with the default home directory location under
/usr/share/mgmt_agent
.
Cause: There's not enough file permissions under
/usr/share
or the file system is read-only.
Possible Actions:
-
Set file permissions to give
mgmt_agent
user access to the default user home directory location:/usr/share
. -
Set a different home directory location using the
USER_HOME_DIR_ROOT
environment variable if you want to use a different location.Set the
USER_HOME_DIR_ROOT
environment variable with the path that you prefer to use as a home directory formgmt_agent
user, and ensure the management agent user has the right file permissions on that preferred directory.
Troubleshoot: Windows: The system cannot find the path specified. Agent install failed.
ERRORLEVEL=9009
Possible Cause: Environment variables have not been set properly due to spaces in the directory/folder name.
Windows environments allow to use spaces within a directory/folder name
which causes an issue with the Management Agent installation since quotes are added
to the name automatically by Windows. For example, there's a directory/folder named:
Program Files
. In this case Windows auto-inserts quotes since
there's a space within the folder name, and it will now say: "Program
Files"
.
Extra quotes can cause an issue since Management Agent installer does not
allow quotes for environment variables like JAVA_HOME
and
AGENT_INSTALL_BASEDIR
.
The Management Agent installer does not accept the following special
characters in the path: [
, ^^
,
"
, '
, &
, or
]
.
Action:
- On the Windows taskbar, right-click the Windows icon and select System.
- In the Settings window, under
Related Settings, click Advanced
system settings.
- On the Advanced tab, click
Environment Variables.
- Click New to create a new environment variable. Click Edit to modify an existing environment variable.
- After creating or modifying the environment variable, click
Apply and then OK to have
the change take effect.
Note
The graphical user interface for creating environment variables may vary slightly, depending on your version of Windows.
Troubleshoot: Management Agent status is "Not Available" in Console after the initial installation
Possible Cause 1: Incorrect system timestamp
Action: Verify the system time of the agent's host, and then you can correct the time if needed.
Possible Cause 2: If you use the input.rsp
response file for the Management Agent, you must define the tags for your Management
Agent compartment.
If the tags are not defined, you may see an error like this:
Attempts:
<--> Endpoint: management-agent.us-ashburn-1.oci.oraclecloud.com
opc-request-id: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXJ8
StartTime: 2024-09-18 03:45:12,662 GMT
Status: 400 Bad Request
Headers: Strict-Transport-Security=max-age=31536000; includeSubDomains;
Connection=close
Content-Length=63
opc-request-id=XXXXXXXXXXXXXXXXXXXXXXXX..................B25ADA8E
Date=Wed, 18 Sep 2024 03:45:12 GMT
Content-Type=application/json
ErrorBody:
{
"code" : "InvalidParameter",
"message" : "Invalid tags"
}
Action: To define the tags specific for your environment, in the
input.rsp
response file, add the following parameters and
specify the key-value pairs for your environment. For more information, see Create a Response file .
DefinedTags = [{"namespace1":{"<key1>":"<value1>"}},
{"namespace2":{"<key2>":"<value2>"}}]
Troubleshoot: After configuration, the Management Agent is not visible in console or through the API
Possible Cause: If after you configure the management agent or the management gateway agent the agent does not display in the Oracle Cloud console or through the API, the correct policies may not be set up for the user or the user group.
Action: Verify the user or the user group has the required policies configured for the management agent or gateway agent. To setup polices, see Create policies for user group.
Troubleshoot: Prometheus or Kubernetes metrics monitored using Management Agent are not available
-
(a) Missing policies
Action: Verify that the policies are added to Management Agent as described in the set up instructions. For details, see Set Up Oracle Cloud Infrastructure for Management Agent Service.
If the policies are missing, add them as described in Set Up Oracle Cloud Infrastructure for Management Agent Service.
-
(b) Typos in policies
Action: Review the policies syntax for any errors by comparing them against the policies samples. For details, see Set Up Oracle Cloud Infrastructure for Management Agent Service.
For example, ensure that the dynamic group definition is defined correctly as per the following syntax with right single quote characters around the compartment id and
managementagent
resource type:ALL {resource.type='managementagent', resource.compartment.id='ocid1.compartment.oc1.examplecompartmentid'}
-
(c) Incorrect compartment id in Dynamic Group definition
Action: Verify that the install key compartment id is the same as the compartment id specified in the agent's dynamic group definition. By default the agent is created in the install key's compartment.
Troubleshoot: Agent runs into OutOfMemoryException
Possible Cause: The agent might run out of heap memory if it is not tuned properly to support the load that has been assigned to it.
Action: Update the heap memory settings for the Management Agent.
- 128 MB for Management Agent as an OCA Plugin.
- 512 MB for standalone Management Agent. (The one downloaded from Management Agent console).
- Open file:
agent_inst/config/java.options
. - Edit the above file. Update the heap setting by modifying the
following line:
-Xmx512m
For example: The above line sets the maximum heap for the agent to be 512 MB.
To change the heap to 800 MB update the above line to be:
-Xmx800m
- Save the file and restart the agent for the changes to take effect.
Troubleshoot: OCI Management Agent is not starting on a Windows host
Possible Cause: If the Agent starts and fails with the following error, this could be because the automatic upgrade of the Management Agent failed. You may see the following errors.
C:\Oracle\mgmt_agent\agent_inst\log>NET START mgmt_agent
The Oracle Management Agent service is starting...................
The Oracle Management Agent service could not be started.
A service specific error occurred: 1.
More help is available by typing NET HELPMSG 3547.
In this log file,
C:\Oracle\mgmt_agent\agent_inst\log\mgmt_agent.log
you may see the
following error.
[SysExecutor.0 (PrometheusEmitter.Agent-discovery)-131] INFO - DiscoveryItemTask PrometheusEmitter.Agent-discovery - autoPromote
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Cleaning up old files...
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - On windows, skipping file owner check
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Starting agent upgrade from version [231002.2039] to version [231002.2040]...
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Inserted RequestSigner associated with request SigningRequester[get([])] for signingKey:SigningKey[xxxxxxxxxxxx]
[SysExecutor.1 (ManagedAgent upgradechecker)-133] INFO - Package Stream size:99003892
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully unzipped agent upgrade package at:
C:\Oracle\mgmt_agent\zip\unpack
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully copied C:\Oracle\mgmt_agent\agent_inst\bin\agentUpgrader.bat to
C:\Oracle\mgmt_agent\agent_inst\bin\tmpAgentUpgrader.bat
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Successfully deleted previous wrapper backup file:
C:\Oracle\mgmt_agent\agent_inst\config\wrapper.conf.backedUpForUpgrade
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Backed up wrapper.conf to attempt agent upgrade
[SysExecutor.1 (ManagedAgentupgrade checker)-133] INFO - Built macros for processing wrapper.conf as:{%SERVICE_TYPE%=mgmt_agent,%JAVA_HOME%=c:\Program
Files\Java\jre-1.8,%EMSTATE%=C:\Oracle\mgmt_agent\agent_inst, %CORE_JAR%=agent-upgrader-1.0.3235.jar,%VERSION%=231002.2039,
%ORACLE_HOME%=C:\Oracle\mgmt_agent\231002.2039}
[SysExecutor.1 (ManagedAgent upgrade checker)-133] INFO - Processed wrapper.conf.template to point it to agent upgrader
[SysExecutor.1(ManagedAgent upgrade checker)-133] INFO - Exiting for wrapper to spin up the agent upgrader...
Action: To fix the issue:
- Stop the Management Agent on the Windows host and then enter the
following
commands:
NET STOP mgmt_agent cd C:\Oracle\mgmt_agent\agent_inst\config Backup wrapper.conf Rename wrapper.conf.backedUpForUpgrade to wrapper.conf
- Start the Management Agent Service.
- Upgrade the Management Agent, see Windows Manual Upgrade.
Now, after the upgrade the Agent displays as Active under Observability & Management.
Troubleshoot: Management Agent automatic upgrade is not working or skipped some Agents
Possible Cause: If the he OCI Management Agent automatic upgrade is not working for some of the Management Agents, it's possible the Management Agent automatic upgrade stopped working because some of the files or directories were owned by invalid owners under the Agent file system.
For example, if some of files or directories in the following location
did not have the correct permissions, the agent automatic upgrade did not work:
/opt/oracle/mgmt_agent/agent_inst
.
In this log file, you may find the following error:
/opt/oracle/mgmt_agent/agent_inst/log/mgmt_agent.log
ERROR - Following files are owned by invalid owners: [/opt/oracle/mgmt_agent/db00_cred.json,
/opt/oracle/mgmt_agent/agent_inst/config/emd.properties.backup]
(ManagedAgent upgrade checker)-32] WARN - Files with invalid owners were found, skipping auto-upgrade
Action: On the Management Agent host, confirm under the Agent file system, all
the files and directories are owned by the mgmt_agent
owner and the
mgmt_agent:mgmt_agent
group so the Management Agent
auto-upgrade can complete.
Troubleshoot: IP address being displayed in host column when Management Agent installed on Windows host
Problem: Management Agent is installed on a Windows host and the Management Agent console displays the Windows host IP address in the Oracle Cloud Console instead of displaying fully qualified domain name or Windows host name.
- Log in to your Windows host and open the Control Panel.
- Select System and Security and then select System.
- Go to the Computer name, domain, and workgroup
settings section and then click Change
settings.
The System Properties window displays.
- If it's not selected, click Computer Name.
- Go to the following message: To rename this computer or its domain or workgroup click Change.
- Select Change, a Computer
Name/Domain Changes window displays.
For example, if the FQDN of the Windows host is:
FOOBAR004.subnet1ab2regsu.dummytenantreg1.abcvcn.com
, enter the short Windows host nameFOOBAR004
in the Computer Name text box. - Select More, the DNS suffix and NetBIOS Computer Name window displays.
- In the Primary DNS suffix of this
computer text box, enter the DNS name of the Windows
host.
For example:
subnet1ab2regsu.exampletenantreg1.abcvcn.com
- Select OK or Apply and then close all the open windows.
- Restart the Windows host.
- Uninstall the existing Management Agent by executing uninstaller.bat script from the Windows terminal.
- Now install again install Management Agent on the Windows machine.
Management Agent installation should be successful and on the Agent page FQDN of the Windows host would be displays in the host column.
Troubleshoot: Management Agent installation fails on SELinux when using external volume
systemctl start mgmt_agent
Job for mgmt_agent.service failed because the control process exited with error code.
See "systemctl status mgmt_agent.service"and "journalctl -xeu mgmt_agent.service" for details.
journalctl -xeu mgmt_agent.service
...
Dec 08 15:48:19 ol9-arm systemd[1261408]: mgmt_agent.service: Failed to execute /dir1/oracle/managementagent/agent_inst/bin/agentcore: Permission denied
Dec 08 15:48:19ol9-arm systemd[1261408]: mgmt_agent.service: Failed at step EXEC spawning /dir1/oracle/managementagent/agent_inst/bin/agentcore: Permission denied
$ ausearch -ts recent -m avc -i
...
type=AVC msg=audit(12/08/202315:49:26.991:51338) : avc: denied { read open } for pid=1261576comm=(gentcore) path=/dir1/oracle/managementagent/agent_inst/bin/agentcore dev="dm-0"ino=915154scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:object_r:default_t:s0 tclass=file permissive=0
All the above error messages indicate that your SELinux does not allow you to execute commands in the chosen folder.
Action: Contact your system administrator and create the required policies that allow installing and running the Management Agent.
Troubleshoot: Management Agent installation fails on Red Hat Enterprise Linux 9.x
The Management Agent installation fails and the following error message
may display: mgmt_agent service creation failed. Reason: Detected
Linux
.
Additionally, the install failure log messages may confirm the error and indicate the set up attempts use an incorrect service manager to install the agent.
Cause: Red Hat removed the chkconfig
package in
the Red Hat Enterprise Linux (RHEL) 9 distribution, for more details see the Red Hat Knowledge base.
Action:
- Confirm the environment uses Red Hat Enterprise Linux 9.x by
running the following
command:
$ cat /etc/redhat-release Red Hat Enterprise Linux release 9.3 (Plow)
- The messages below highlight the problem that the OS/family was
not identified correctly using the rules present in agentcore script and the
install will attempt to set up agent service using
init.d
and notsystemctl
on RHEL 9x.$ rpm -ivh oracle.mgmt_agent.231118.1208.Linux-x86_64.rpm Verifying... ################################# [100%] Preparing... ################################# [100%] Checking pre-requisites Checking if any previous agent service exists Checking if OS has systemd or initd Checking available disk space for agent install Checking if /opt/oracle/mgmt_agent directory exists Checking if 'mgmt_agent' user exists 'mgmt_agent' user already exists, the agent will proceed installation without creating a new one. Checking Java version Trying /omc/java/jdk1.8.0_391 Java version: 1.8.0_391 found at /omc/java/jdk1.8.0_391/bin/java Checking agent version Updating / installing... 1:oracle.mgmt_agent-231118.1208.1################################# [100%] Executing install Unpacking software zip Copying files to destination dir (/opt/oracle/mgmt_agent) Initializing software from template Checking if JavaScript engine is available to use Creating 'mgmt_agent' daemon mgmt_agent service creation failed. Reason: Detected Linux: Installing the mgmt_agent daemon... ln: failed to create symbolic link '/etc/init.d/mgmt_agent': No such file or directory ln: failed to create symbolic link '/etc/rc3.d/K20mgmt_agent': No such file or directory ln: failed to create symbolic link '/etc/rc3.d/S20mgmt_agent': No such file or directory ln: failed to create symbolic link '/etc/rc5.d/S20mgmt_agent': No such file or directory ln: failed to create symbolic link '/etc/rc5.d/K20mgmt_agent': No such file or directory Service not installed. warning: %post(oracle.mgmt_agent-231118.1208-1.x86_64) scriptlet failed, exit status 1
- Verify the
chkconfig
package is missing as described in the following article on the Red Hat Knowledge base.
Solution 1 - Install the chkconfig
package
- Install the missing package by executing the following
command:
$ dnf install chkconfig
- Validate the package exists in the environment by executing the following
command:
$ rpm -qa | grep chkconfig
- Install the Management Agent again.
Solution 2 - Without Installing the chkconfig package
This is a workaround, only use this solution if the
chkconfig
package can not be installed. The recommended solution is to install the
chkconfig
package.
If installing the chkconfig
package is not an option as
described in the above Solution 1 section, then complete the following steps as an
alternative solution to install the Management Agent software.
- Switch to a root shell.
- Set the environment variable
DIST_LINUX_FAMILY_OVERRIDE="Red Hat"
. - Install the Management Agent software.
$ sudo /bin/bash
$ export DIST_LINUX_FAMILY_OVERRIDE="Red Hat"
# RPM install
$ rpm -ivh <rpm_file_name.rpm>
# ZIP install
$ ./installer.sh <full_path_of_response_file>
Troubleshoot: Unable To View Prometheus Namespace and Metrics in the OCI Monitoring Service
In OCI Console, if the required policies are setup correctly, and the
Prometheus Namespace and metrics are not visible from OCI Monitoring in the Metric
Explorer then you may need to confirm the mgmt_agent OS user has read
permissions for .properties
file.
.properties
file. This file may be owned
by root OS user with 600 permissions.
agent_inst/discovery/PrometheusEmitter/compute_exporter.properties
- Confirm the
.properties
file inagent_inst/discovery/PrometheusEmitter
is owned by the mgmt_agent OS user and the mgmt_agent OS user has read permissions on this file. - Restart the OCI Management Agent.
Troubleshoot: Flag provided but not defined
Error: If you see the
following error: flag provided but not defined:
-trusted-certs-dir
$ sudo -u oracle-cloud-agent /usr/libexec/oracle-cloud-agent/plugins/oci-managementagent/oci-managementagent -cli -trusted-certs-dir=/tmp/trustedcerts
flag provided but not defined: -trusted-certs-dir
Usage of /usr/libexec/oracle-cloud-agent/plugins/oci-managementagent/oci-managementagent:
-agent-config string
agent config yml file
-cli
run the monitoring in cli mode
-debug
enable debug logging
-dev
enable dev runs
-force-redeploy
force redeploy image
-metadata-config string
metadata config json file
-oci-config string
oci config file
-staging
enable staging endpoint
-upgrade-native-agent
invoke native agent upgrade
Troubleshoot: Adding SSH Credentials Fails With Error: Illegal unquoted character
Possible Cause: When you add source credentials to an agent, if you see the following illegal unquoted character error it means the JSON file is not formatted correct. For example, if the SSH key displays as multiple lines this could result in the SSH key not recognized and this illegal unquoted character error. For example:
[root@host ociagent]# cat 2nd.json | sudo -u mgmt_agent
/opt/oracle/mgmt_agent/agent_inst/bin/credential_mgmt.sh -o upsertCredentials -s
logancom.fasterxml.jackson.databind.JsonMappingException: Illegal unquoted character ((CTRL-CHAR,
code 10)): has to be escaped using backslash to be included in string valueat [Source: (BufferedInputStream); line: 7, column: 70] (through reference chain:
oracle.polaris.core.source.metadata.impl.creds.CredentialFormat["properties"]->java.lang.Object[][1]->oracle.polaris.core.source.metadata.impl.creds.CredentialFormat$Property["value"])at
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)at
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361)at
- Add
\n
special character before each line of the key. For example:{"source":"host.myvm.example.com", "name":"OSCreds", "type":"SSHKeyCreds", "description":"SSH keys for a user", "properties":[ {"name":"SSHUserName","value":"username"}, {"name":"SSHPrivateKey","value":"-----BEGIN RSA PRIVATE KEY-----\nMIICXQIBAAKBgQCKWjoLfOKsjglGQcKwB0zm1o/OabClELjcOOTS1FJh6pzvrDeL \nn3IfIW9VUiyfGNkjnj4cuO0mVctaQgGVtT6H+4fL8HKjWqPg9S+uc0WBKBzaLi9H \nAoGACZctlORIVkvWSr9+PnOTGiFfgKCE9TxOhD2RZyf+ufjofhjDFPOtlojbzd9P \nZovzaWurxJPxJIon+Y6/y1/wAKUFisOlY2XJl76NKXm/00OGSfocQ3WsxapEsWwR \nalRL0l5FhXVpTV5OH3M4Dy5ksIcDqiV6r \nMejuJ++3AHlflzzoITtmS3RDlpSsd27ZH9vzV9HgFQU3volRgOZqnVm/XXXXXXX \n-----END RSA PRIVATE KEY-----"}, {"name":"SSHPublicKey","value":"-----BEGIN PUBLIC KEY-----\nMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCKWjoLfOKsjglGQcKwB0zzzz/O \nabClELjcOOTS1FJh6pzvrDeLn3IfIW9VUiyfQAcfTWRwb0JtzMcRONQIDAQAB \n-----END PUBLIC KEY-----"]}
- Or append the complete key value into a single line.
Now, the JSON is formatted correct and you can retry the operation to add the source credentials.
Troubleshoot Management Agents Deinstallation Issues
This topic covers the typical issues and their resolutions related to deinstalling Oracle Management Agents.
Error:… specifies multiple packages
Cause: The rpm registry has multiple packages with that name.
--allmatches
flag when running
the rpm -e
command:
rpm -e oracle.mgmt_agent --allmatches
Error:… scriptlet failed with exit code
Cause: The rpm could not stop the running agent or failed to remove the agent service file from the system.
- Check if your agent is running:
For OL7:
systemctl status mgmt_agent
For OL6:
If you see the agent is running, stop it:/sbin/initctl status mgmt_agent
For OL7:
systemctl stop mgmt_agent
For OL6:
/sbin/initctl stop mgmt_agent
- Remove rpm by executing
rpm -e oracle.mgmt_agent --noscripts
. This command will skip all rpm scripts and try to remove the package from its registry. - Remove all the agent files by executing
rm -rf /opt/oracle/mgmt_agent
. Also run the following:For OL7:
rm -rf /etc/systemd/system/mgmt_agent.service
For OL6:
rm -rf /etc/init/mgmt_agent.conf
Troubleshoot Management Agent Upgrade Issues
When you upgrade Oracle Management Agent, you can use the following list to troubleshoot common errors.
Troubleshoot: Auto upgrade is enabled, but the agent does not upgrade automatically because invalid file owner
Cause: You can configure Management Agents to upgrade automatically. The automatic upgrade option is available at the tenancy level, so if you select the automatic upgrade option in the Oracle Cloud Console, all the agents in your OCI tenancy will upgrade automatically. It may take up to 24 hours after a new version of the Agent is available in the Management Agent Cloud Service, for the Agent to automatically upgrade.If the Agent version does not get updated after waiting for 24 hours, then some issues on the disk could be preventing the Agent from upgrading automatically.
The most
common cause of this error is that files are owned by an OS user that is different
from user that installed Management Agent. The upgrade process runs as the same OS
user as the current running process, and does not have the ability to switch to the
root. Any file in the mgmt_agent
directory manually created by the
user has the potential to interfere with the Agent's ability to upgrade
automatically.
mgmt_agent.log
file, at the following locations:
- For the Standalone Management Agent:
/opt/oracle/mgmt_agent/agent_inst/log/mgmt_agent.log
- For the Management Agent plug-in on Oracle Cloud Agent (OCA) on
OCI Compute Instances:
/var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/log/mgmt_agent.log
In the
mgmt_agent.log
file, you may see the following error indicating the problematic files:2024-08-14 18:13:31,857 [SysExecutor.7 (ManagedAgent upgrade checker)-36] ERROR - Following files are owned by invalid owners: [/opt/oracle/mgmt_agent/agent_inst/config/emd.properties.oldbackup] 2024-08-14 18:13:31,857 [SysExecutor.7 (ManagedAgent upgrade checker)-36] WARN - Files with invalid owners were found, skippingauto-upgrade
- The user must change the ownership and group of the affected files to the user account that originally installed the Management Agent.
- If a file was created with the wrong owner, then you can delete
the file or move the file to another directory outside of the Management
Agent directory. Depending on your installation, you can find the Management
Agent directory at one of the following locations:
- For the Standalone Management Agent:
/opt/oracle/mgmt_agent/
- For the Management Agent plug-in for an Oracle Cloud
Agent in an OCI Compute Instance:
/var/lib/oracle-cloud-agent
Note
To avoid these issues, do not manually create any files in the Management Agent directory. - For the Standalone Management Agent:
Troubleshoot Management Agents on Compute Instances
Users may encounter various errors during the deployment of Oracle Management Agent on compute instances. Causes and recommended actions for some common errors are listed below.
- Troubleshoot: Agent is in 'Not Available' state
- Troubleshoot: Management Agent setup failed with fork/exec oracle.polaris.oca.main: permission denied
- Troubleshoot: Management Agent authentication failure due to clock skew, a different time on the compute instance compared to the time on the server
- Troubleshoot: OCI Management Agent Service: Agent Not Visible In OCI Console Under Observability & Management
Troubleshoot: Agent is in Not Available state and agent log file reports "Invalid tags"
The Management Agents page shows the Agent in 'Not available'
state and the mgmt_agent.log
file (located under
<Agent_Inst>/logs
directory) reports the following
message:
ErrorBody:{"code" : "InvalidParameter","message" : "Invalid tags: Resource creation failed because the resource requires tag value(s). Add a value to the each of the following tag definition(s): \nGLOBAL.ComponentType, GLOBAL.ApplicationName,
Cause:
This issue can happen when the compartment requires mandatory tags for every resource and the resource creation request does not include the tags, then the activation request would fail with the message:"Invalid tags: Resource creation failed because the resource requires tag value(s)" and the agent status is shown as 'Not Available'.
Action:
- Management Agents
If you have a standalone Management Agent, it must be uninstalled.
If the Management Agent was installed using an RPM or a ZIP file, it must be uninstalled and reinstalled by providing a response file using the
DefinedTags
parameter as described in the Review Agent Parameters section. - Management Agents on Compute InstancesIf the Management Agent is enabled through the OCI Console using the OCA plugin, then there is no response file since it's not used for compute instances. In this case, do the following:
- Log in to the instance where the Management Agent is
deployed and sudo as
oracle-cloud-agent
user using the following command:sudo -u oracle-cloud-agent sh
- Create an
agent.definedtags
file in the following location:/var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/config/security/resource/
- Add defined tags needed for the resource to be created
in
agent.definedtags
file.For example, if there are 2 namespaces
admin_namespace
andfinance_namespace
and each namespace uses 2 keys and 2 valuesenvironment_type=non-prod
,sensitivity=restricted
, then you can use the following:DefinedTags = [{"admin_namespace": {"environment_type": "non-prod", "sensitivity": "restricted"}, "finance_namespace": {"environment_type": "non-prod","sensitivity": "restricted"}}]
- Restart oracle-cloud-agent using the command:
sudo systemctl restart oracle-cloud-agent
- Log in to the instance where the Management Agent is
deployed and sudo as
Troubleshoot: Management Agent setup failed with fork/exec oracle.polaris.oca.main: permission denied
Users may encounter this error resulting in failure to install or start the Management Agent.
The error message shown in the Plugin view of compute instance for the Management Agent Plugin looks similar to the following:
workflow.go:23: [ERROR] step [*core.SetupImageStep] execution failed with [setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]]
mgmtagent_image.go:139: [ERROR] bootstrap workflow failed with error setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]
agent.go:74: [ERROR] failed to start agent during bootstrap with [setup image failed with [fork/exec 230821.1905/bin/oracle.polaris.oca.main: permission denied]]
Possible Cause:
This issue may happen when a compute instance disallows fork/execute operations from the /tmp
directory by mounting the tmpfs
with the noexec
flag.
$ mount | grep tmpfs
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,inode64)
The output should say does have the noexec flag.
Action:
- Stop Oracle Cloud Agent.
sudo systemctl stop oracle-cloud-agent
- Add the following setting to the file: /etc/oracle-cloud-agent/plugins/oci-managementagent/config.yml
overrideTmpDir: true
- Start Oracle Cloud Agent.
$ sudo systemctl start oracle-cloud-agent
Troubleshoot: Management Agent authentication failure due to clock skew, a different time on the compute instance compared to the time on the server
Cause: If there is a clock skew of more than 5 minutes between the Compute Instance where the agent is running and Oracle Cloud Infrastructure Identity service, then the requests will be rejected with a HTTP 401.
If you find the following errors:On the OCI Compute Instance, go to the Oracle Cloud Agent tab, the Management Agent displays an error in the Message column:
rpc error: code = Unavailable desc = connection error: desc = "transport: error while
dialing: dial unix /var/lib/oracle-cloud-agent/tmp/plugin1825606937: connect: connection
refused"
Or in the logs you may find the following error:
2022-12-09 07:41:22,144 [SysExecutor.0 (Resource Principal Token Refresher)-47] WARN - #-# invocation access log [request-id-prefix: K9YBE4AY] #-#
Service: OCI
Method: GET
Path: /20200202/managementAgents/ocid1.managementagent.....rvf6i3ba/resourcePrincipalToken
Headers:opc-rpt-request-token=********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
date=Fri, 09 Dec 2022 07:41:19 GMT
host=management-agent.ap-tokyo-1.oci.oraclecloud.com
Authorization=******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
opc-request-id=K9YBE4AYQFIMP2J6HGQ2JUKU1IJPFYVO
User-Agent=Jersey/2.34 (Apache HttpClient 4.5.13)Attempts:
<--> Endpoint: management-agent.ap-tokyo-1.oci.oraclecloud.com
opc-request-id: K9YBE4AYQFIMP2J6HGQ2JUKU1IJPFYVO
StartTime: 2022-12-09 07:41:19,976 GMT
Status: 401 Unauthorized
Headers: X-Content-Type-Options=nosniff
Content-Length=187
opc-request-id=K9YBE4AYQFIMP2J6HGQ2JUKU1IJPFYVO/E4356B68C6C541BAD867E46760316D35/4118B130EE46A8E25F90DC91AB7F12D7
Date=Fri, 09 Dec 2022 07:41:21 GMT
Content-Type=application/json
ErrorBody:
{
"code" : "NotAuthenticated",
"message" : "Unable to authenticate the request for ocid1.managementagent.oc1.ap-tokyo-1.amaaaa...6frjnrbvqrvf6i3ba"}
Action
Fix the clock skew and restart. If the
agent has been down for days because of this error, then you must clean up
thedonotrestart
file before restarting the agent.
Additionally, Oracle recommends to set up the OS date time to auto-sync with NTP servers to avoid downtime in the future. If additional services are running on the machine it's best practice to restart the machine after the time change so the services can reset with the new time.
To correct the OS date time where the agent is running and then restart the agent you can follow these steps:
- To stop the agent run the following
command:
sudo systemctl stop oracle-cloud-agent
- Correct the date and time.
- Run the following command to delete the
configure.donotrestart
file.sudo rm /var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/config/configure.donotrestart
- Start the
agent.
sudo systemctl start oracle-cloud-agent
Troubleshoot: OCI Management Agent Service: Agent Not Visible In OCI Console Under Observability & Management
OCI Management Agent installed successfully on a Compute Instance. The Agent is running on the host. However, the Agent is not appearing in the Oracle Cloud Console if you go to the Navigation menu, select Observability & Management, go to Management Agents and then select Agents.
Possible Cause: The compartments of the Compute Instance and Agent Install Key are different.
- Stop and uninstall the Management Agent on the Compute Instance.
- Create Agent Install Key in the same compartment of Compute Instance.
- Use this new Install key you just created, and Install the Management Agent.
Troubleshoot Management Gateways
This topic covers common issues and solutions related to Management Gateways.
- Troubleshoot: Remove Management Gateway
- Troubleshoot: Configure Management Gateway
- Troubleshoot: Management Gateway installation fails on Red Hat Enterprise Linux 9.x
- Troubleshoot: Management Gateway Installation Fails With Error: Certificates could not be created and the Identity logs report: Authentication failed: DATE_OUTSIDE_CLOCK_SKEW
- Troubleshoot: When installing or configuring Management Gateway, Timed Out Error
Troubleshoot: Remove Management Gateway
Cause: In some cases, it may be necessary to remove an existing Management Gateway installation, in order to reinstall it.
-
Check if the gateway is running:
For OL7:
systemctl status mgmt_gateway
For OL6:
/sbin/initctl status mgmt_gateway
If the gateway is running, stop it:
For OL7:
systemctl stop mgmt_gateway
For OL6:
/sbin/initctl stop mgmt_gateway
-
Remove the installed Gateway RPM using the following command:
rpm -e oracle.mgmt_gateway --noscripts
-
Remove any remaining Gateway files using the following command:
rm -rf /opt/oracle/mgmt_agent
-
Run the following:
For OL7: rm -rf /etc/systemd/system/mgmt_gateway.service
For OL6: rm -rf /etc/init/mgmt_agent.conf
Troubleshoot: Configure Management Gateway
Cause: In some cases, the hostname might not be resolved in the installation environment which might cause the installation to fail with the following error message:
Troubleshoot: "Could not resolve hostname <hostname value> in the installation environment. Resolve the hostname or provide the GatewayCertCommonName in the response file and rerun the gateway setup script."
Action:
- Check and resolve the hostname of the environment to get the fully
qualified doamin name (FQDN) value after running the command:
hostname -f
- Optionally a user can provide a custom fully qualified domain name
for the gateway configuration via seeding the
GatewayCertCommonName
property in input response file. See Response File Parmaters - Re-run the configure gateway script again.
sudo /opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<user_home_directory>/gateway.rsp
Cause: In some cases, the Management Gateway installation might fail with the following error message due to the absence of policies in OCI or because of resource limit issues in the tenancy. If you see the following error, follow the steps below.
Troubleshoot: "Failed to start Management Gateway as certificates could not be created, initialized or retrieved in OCI. Please check the logs for more details."
Action:
- Open the log file in the Management Gateway installation directory,
for example:
/opt/oracle/mgmt_agent/plugins/GatewayProxy/statedir/log/mgmt_gateway.log
- If the log file contains any of the following 404 error codes, then
choose one of the following options to resolve the issue:
2023-07-25 15:38:06.694/CEST [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Response String { "code" : "NotAuthorizedOrNotFound", "message" : "Authorization failed or requested resource not found."} 2023-07-25 15:38:06.696/CEST [pool-3-thread-1] ERROR com.oracle.mgmtagent.proxy.ProxyServer - Error while initializing and loading certificate bundlescom.oracle.mgmtagent.proxy.exception.CertificateFailureException: The response status is 404 after multiple retries at com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility.executeRequest(CertificateUtility.java:293) ~
- Recommended option: Use the Management Gateway Quick Start Marketplace Application to automatically create the dynamic groups, policies, and manage the certificates required to install Management Gateway.
- Manually add and confirm the correct dynamic groups and policies required for installing the Management Gateway are added to the specific compartment within the tenancy where you want to install the Management Gateway. For more information, see Perform Prerequisites for Deploying Management Gateway.
- If the log file contains any of the following 400 error codes, then
review the following options to resolve the issue:
2023-09-20 18:51:32.772/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateCreationUtil - Create Vault Service Url invoked https://kms.us-ashburn-1.oraclecloud.com/20180608/vaults 2023-09-20 18:51:33.400/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Received response code 400 2023-09-20 18:51:33.400/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Header name opc-request-id , value /5704D03441842D3818B824B2D6B2712E/1D1FED893474FDA900188E24F3DEE59B 2023-09-20 18:51:33.401/GMT [pool-3-thread-1] INFO com.oracle.mgmtagent.proxy.oci.certificate.util.CertificateUtility - Response String { "code" : "LimitExceeded", "message" : "The limit for this tenancy has been exceeded."}
- Check the limit for the Default Vault Count resource for the Key Management Service in OCI console. You can raise a request to increase the resource limits. For more information, see Managing Keys and Managing Vaults.
- You can set up certificates manually, for details see
Perform Prerequisites for Deploying Management Gateway and go to the Manual Certificate
Management section.
Note
When you create the Issued by internal CA certificates, the Certificate Profile must be either TLS Server or TLS Client and only the RSA signing algorithms are supported.
- If there are any other failures related to the Vault or the Key
service API's in the logs, then you can raise a request and reach out to the
oci_kms
team by providing the response body andopc-request-id
. - If there are any other failures related to Certificate Authorities
or Certificate service API's in the logs, then raise a request and reach out to
oci_certificates
team by providing the response body andopc-request-id
.
Troubleshoot: Management Gateway installation fails on Red Hat Enterprise Linux 9.x
The Management Gateway installation fails and the following error message may display:mgmt_gateway service creation
failed. Reason: Detected Linux
.
Additionally, the install failure log messages may confirm the error and indicate the set up attempts use an incorrect service manager to install the gateway.
Cause:
Red Hat removed the chkconfig
package in the Red Hat Enterprise
Linux (RHEL) 9 distribution, for more details see the Red Hat Knowledge base.
Action:
- Confirm the environment uses Red Hat Enterprise Linux 9.x by
running the following
command:
$ cat /etc/redhat-release Red Hat Enterprise Linux release 9.3 (Plow)
- The messages below highlight the problem that the OS/family was
not identified correctly using the rules present in agentcore script and the
install will attempt to set up agent service using
init.d
and notsystemctl
on RHEL 9x.$ rpm -ivh oracle.mgmt_gateway.231118.1208.1702955171.Linux-x86_64.rpm Verifying... ################################# [100%] Preparing... ################################# [100%] Checking pre-requisites Checking if any previous gateway service exists Checking if OS has systemd or initd Checking available disk space for gateway install Checking if /opt/oracle/mgmt_agent directory exists Checking if 'mgmt_agent' user exists 'mgmt_agent' user already exists, the gateway will proceed installation without creating a new one. Checking Java version Trying /omc/java/jdk1.8.0_391 Java version: 1.8.0_391 found at /omc/java/jdk1.8.0_391/bin/java Checking agent version Updating / installing... 1:oracle.mgmt_gateway-231118.1208.1################################# [100%] Executing install Unpacking software zip Copying files to destination dir (/opt/oracle/mgmt_agent) Initializing software from template Checking if JavaScript engine is available to use Creating 'mgmt_gateway' daemon mgmt_gateway service creation failed. Reason: Detected Linux: Installing the mgmt_gateway daemon... ln: failed to create symbolic link '/etc/init.d/mgmt_gateway': No such file or directory ln: failed to create symbolic link '/etc/rc3.d/K20mgmt_gateway': No such file or directory ln: failed to create symbolic link '/etc/rc3.d/S20mgmt_gateway': No such file or directory ln: failed to create symbolic link '/etc/rc5.d/S20mgmt_gateway': No such file or directory ln: failed to create symbolic link '/etc/rc5.d/K20mgmt_gateway': No such file or directory Service not installed. warning: %post(oracle.mgmt_gateway-231118.1208.1702955171-1.x86_64) scriptlet failed, exit status 1
- Verify the
chkconfig
package is missing as described in the following article on the Red Hat Knowledge base.
chkconfig
package- Install the missing package by executing the following
command:
$ dnf install chkconfig
- Validate the package exists in the environment by executing the
following
command:
$ rpm -qa | grep chkconfig
- Install the Management Gateway again.
chkconfig
packageThis is a workaround, only use this solution if the
chkconfig
package can not be installed. The
recommended solution is to install the chkconfig
package.
If installing the chkconfig
package is not an option as
described in the Solution 1 section above, then complete the following steps as an
alternative solution to install the Management Gateway software.
- Switch to a root shell.
- Set the environment variable
DIST_LINUX_FAMILY_OVERRIDE="Red Hat"
. - Install the Management Gateway software.
$ sudo /bin/bash
$ export DIST_LINUX_FAMILY_OVERRIDE="Red Hat"
# RPM install
$ rpm -ivh <rpm_file_name.rpm>
# ZIP install
$ ./installer.sh <full_path_of_response_file>
Troubleshoot: Management Gateway Installation Fails With Error: Certificates could not be created and the Identity logs report: Authentication failed: DATE_OUTSIDE_CLOCK_SKEW
Cause:
Identity logs report authentication failed:
DATE_OUTSIDE_CLOCK_SKEW
# /opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<PATH>/gateway_agent.rsp/opt/oracle/mgmt_agent/agent_inst/bin/setupAgent.sh
opts=<PATH>/gateway_agent.rsp
Executing configure
Parsing input response file
Validating install key
Generating communication wallet
Generating security artifactsRegistering Management Gateway
Found service plugin(s):[GatewayProxy]
Starting gateway...
Gateway started successfully
Starting plugin deployment for: [GatewayProxy]
Deploying service plugin(s)...Done.
GatewayProxy : Successfully deployed external plugin
Gateway setup completed and the gateway is running.
In the future gateway can be started by directly running: sudo systemctl start mgmt_gateway
Please make sure that you delete <PATH>/gateway_agent.rsp or store it in secure location.
Creating gateway system properties file
Creating properties fileCreating or validating certificates
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Waiting for Management Gateway to create or validate certificates...
Failed to start Management Gateway as certificates could not be created, initialized or retrieved in OCI. Please check the logs for more details.
Management Gateway stopped
Action:
On the host where Management Gateway is installed, ensure the host time is correct and then install Management Gateway.
Troubleshoot: When installing or configuring Management Gateway, Timed Out Error
If you verify the OCI Console displays the Management Gateway as active, but the metrics are not populating.
/opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<user_home_directory>/gateway.rsp
Starting gateway...
Gateway started successfully
Starting plugin deployment for: [GatewayProxy]
Deploying service plugin(s)...............Timed out.
Agent is unable to check if it deployed requested service plugin(s) successfully or not.
Please check back later on the console.
Cause: A longer than expected time to complete the Management Gateway setup task may result in a network communication issue and may cause the Management Gateway to time out.
- Confirm there are no network communication issues.
- Verify if the following proxy details were updated in the
response file to determine if any proxy issue exists. For example, confirm
the correct proxy host and port details were updated in the response file:
ProxyHost = my.proxyhost.com
ProxyPort = 80
- Stop the Management Gateway using the following command:
systemctl stop mgmt_gateway
- Re-run the Management Gateway setup using the following
command:
/opt/oracle/mgmt_agent/agent_inst/bin/setupGateway.sh opts=<user_home_directory>/gateway.rsp
- The Management Gateway setup should now complete successfully and the metrics should populate.