Troubleshoot Stack Monitoring

The topics in this section provide troubleshooting information to identify and address common issues that may occur while working with Stack Monitoring.

Troubleshoot General Issues

In some cases it may be necessary to review the Management Agent logs for additional details:

For the Management Agent plug-in on Oracle Cloud Agent (OCA) on OCI Compute Instances:

/var/lib/oracle-cloud-agent/plugins/oci-managementagent/polaris/agent_inst/log/mgmt_agent.log

For the Standalone Management Agent (manually installed agent):

/opt/oracle/mgmt_agent/agent_inst/log/mgmt_agent.log

New permissions in resource-types are not propagated

This happens because IAM does not recompile a policy unless there is a change to the policy statement.

For any existing policies that use resource-types, when new permissions are added to the resource-type, edit the policy by adding a blank space. Then, save the policy.

For more information, see New permissions in resource-types are not propagated.

Invalid Tags error

This happens when a Tag Key Definition with a Value Type=List includes a tag variable as an element. Assigning such a tag to a resource works initially. However, validation fails during actions like refresh or when assigning a new tag, resulting in the error Invalid tags.

Correct Usage:

  • Tag variables can be used in default tags, but they are not supported in defined tags with predefined values (lists).
  • A Tag Key Definition cannot include tag variables as predefined list values.

For more information, see Tagging.

Troubleshoot a Maintenance Window

Retry a Maintenance Window

A retry can be performed only after an operation is marked as Partial Success, for Active Maintenance Windows.

Access the actions menu of the Maintenance Window to access the Retry option.

Updated topology

When a resource changes its topology, like a cluster adding or removing one or several of its servers, the Maintenance Window is not automatically updated. To updated the resources included in the Maintenance Window after a topology change, it's necessary to edit the Maintenance Window according to the resource's new topology.

Maintenance Window stuck in "Creating" state

If a Maintenance Window has been stuck in "Creating" state for more than 10 minutes, the Maintenance Window can be stopped by selecting Stop from the 3-dot menu.

If Maintenance Window has been stuck for less than 10 minutes, the Maintenance Window will not be allowed to stop the creation process.

Troubleshoot Policy Manager

Policy quota reached


Policy Manager max quota error

Ensure that new policies can be created in tenancy or use existing policies (policy should exist in current compartment and in root compartment). In order to allow creation of new policies tenancy clean up outdated policies or work with Oracle to increase policy limits. Once new policies can be created, retry setup.

Compute auto activation cannot be created

Policy Manager expects that Stack Monitoring configurations are in ACTIVE status, as you can have one such configuration in compartment. If there are configurations in unexpected status, creation of configuration will fail.


compute auto activation cannot be created

Cleanup configurations in invalid state in current compartment. For cleanup, use public SDK or CLI, such as command delete.

Troubleshoot a host

Windows host discovery failure

Error:

[host] Discovery failure: ExecutionException: FetchletException: Process invocation failure: java.io.IOException: 
Cannot run program "powershell.exe": CreateProcess error=2, 
The system cannot find the file specified due to FetchletException: Process invocat...;

Solution:

Perform the Steps below to Add PowerShell on Windows host System Variables.

  1. Open the System Control Panel, and select System. Select Advanced System Settings, go to the Advanced tab, and select Environment Variables.
  2. Under System Variables, edit Path, and ensure that the below path has been added:
    %SYSTEMROOT%\System32\WindowsPowerShell\v1.0\
  3. Retry discovery.

Troubleshoot EBS

EBS Database with Edition-Based Redefinition (EBR)

Solution: For EBS instances with EBR enabled, after every new edition created in the database, it's necessary to refresh the EBS resource to update stale connections to old editions held by the management agent and continue metric collection. If not refreshed, metrics data will stop being collected.

EBS Weblogic Discovery Fails "Unexpected Exception due to IOException

Possible cause: Incorrect Management Agent or agent host credentials selected for the Resource Discovery. Unable to establish a connection to perform the discovery operation.

Solution: Re-enter the agent and host details and retry. If no Management Agent exists, install a Management Agent (see Install Management Agent) and retry the Resource Discovery task.

Troubleshoot OUD

Logs can be found under <CUSTOM_EXPORTER_DIRECTORY_PATH>/logs.

If the OUD exporter was successfully setup, but no metrics are being uploaded to Telemetry service, then refer to the Management Agent logs, located under <MANAGEMENT_AGENT_INSTALLATION_DIRECTORY>/log and search for your OUD <RESOURCE_NAME>.

Troubleshoot PeopleSoft

Discovery Job Behavior

When running a PeopleSoft discovery job, each Process Scheduler Domain work item generates a log. Logs detail successes and errors (such as a domain being down). Each log entry includes a Work Item ID for easy tracking.

Discovery Error Messages

Database validation failed error

When a discovery job fails, use the Work Item (WI) ID to search for detailed messages. If your database shows status as Not Reporting, make sure your monitoring user has not expired. If it is expired, reset its password.

Common errors and their fixes include:

Invalid Credentials:

  • Error: Invalid username/password, logon denied
  • Cause: Incorrect username or password.
  • Solution: Re-enter correct credentials in the Database Credentials section.
Hostname Errors:
  • Error: IO Error: The Network Adapter could not establish a connection due to UnknownHostException. Name or service not known
  • Cause: Incorrect or misspelled host in the PSFT Database section.
  • Solution: Correct the hostname and retry the discovery job.
Connection Failure:
  • Error: Connection refused, socket connect lapse
  • Cause: Incorrect port number.
  • Solution: Enter the correct database port and retry.

  • Error: Failed to connect: java.sql.SQLException: ORA-01017: invalid username/password
  • Cause: The DBSNMP password's initial numeric character is the root cause, as it conflicts with Stack's monitoring specifications..
  • Solution: To address this, modify your DBSNMP password, ensuring it begins with either an alphabetic character or an underscore. This ensures compliance with Stack's monitoring standards. Then, update your monitoring configuration files or settings with the new password credentials. Next, initiate a rediscovery process..

Listener Error:

  • Error: Listener refused the connection with the following error: ORA-12514, TNS: listener does not currently know of service requested in connect descriptor
  • Cause: Wrong Database Service Name.
  • Solution: Enter the correct Database Service Name in the PSFT Database section.

Process Scheduler Domain resources are showing as down:

  • Error: Agent log shows the error: “WARN - failed to connect for cache: url service:jmx:rmi:///jndi/rmi://<ps_domain_host>:<admin_port>/<domain_name>/DomainRuntime/DefaultConnector
  • Cause: The JMX connection from the Monitoring Agent host to Process Scheduler Domain failed due to a change in the Domain Admin Port number.
  • Solution: Restart the domain in psadmin and refresh Peoplesoft as described under PeopleSoft Refresh.

Resource families validation failed error

PeopleSoft has the following resource families:

  • Application Server Domain
  • Process Scheduler Domain
  • PeopleSoft Internet Architecture (PIA)

There can be several resources of each family in a discovery job. A discovery job will be marked as successful if at least one resource of each type is successful. Therefore, a job can be successful even if there are some work items failing for some child resources.

Discovery failed for oracle_psft_appserv" (also applies to oracle_psft_pcrs):

  • Cause: Invalid Credentials
  • Solution: Enter the correct credentials.

Failed to retrieve NameNotFoundException

  • Cause: Domain Down
  • Solution: Ensure the application/domain is running in the PeopleSoft console, and restart if necessary.

PIA Domain Misconfiguration

  • Cause: Occurs when a PIA domain is down or misconfigured.
  • Solution: Fix the PIA domain configuration.

Elasticsearch errors

If Elastic Search is discovered together with PeopleSoft discovery, this work item discovery will define the success or fail of the PeopleSoft discovery. If an error occurs while discovering Elastic Search and the work item fails, then the PeopleSoft discovery job will not be successful either.

500 SERVER ERROR:

  • Cause: Failed to collect data due to invalid username.

  • Solution: Enter the correct username.

401 Unauthorized Access:

  • Cause: Invalid credentials.

  • Solution: Ensure the correct password is provided.

FileNotFoundException:

  • Cause: TrustStore file path is incorrect or file is missing.

  • Solution: Correct the TrustStore path and ensure the file is accessible by the agent host.

Troubleshoot SOA

Monitoring SOA applications created from Marketplace images:

When a SOA application is provisioned using Market place Image, then data in SOA related metrics are not populated. The Marketplace images places SOA and WebLogic configuration files in two seperate locations. To populate the SOA metrics, copy the configuration files from the configuration files to the WebLogic directory.

Please copy the files as indicated and restart Weblogic.

SOA Infra Metrics will start appearing in a few minutes after Weblogic restart

Marketplace image is installing SOA Suites in a different location than the Weblogic stack

/u01/app/oracle/middleware — Weblogic
/u01/app/oracle/suite/  --- SOA Suite

Please copy the following files:

From: /u01/app/oracle/suite/em/adml

-rwxrwxr-x. 1 oracle oracle 21156 May 18 2011 server-scheduler_service.xml

-rwxrwxr-x. 1 oracle oracle 15788 May 18 2011 domain-scheduler_service.xml

-rwxrwxr-x. 1 oracle oracle 2929 Nov 11 2013 server-bea_alsb.xml

-rwxrwxr-x. 1 oracle oracle 242238 Feb 28 2016 server-oracle_soainfra.xml

-rwxrwxr-x. 1 oracle oracle 232504 Jul 10 2016 server-oracle_soainfra_partition.xml

-rwxrwxr-x. 1 oracle oracle 2992 Aug 15 2016 server-oracle_soa_composite-11.0.xml

-rwxrwxr-x. 1 oracle oracle 95241 Jan 16 2017 domain-oracle_soainfra.xml

To: /u01/app/oracle/middleware/em/adml