5 Debugging in Coherence
This chapter includes the following sections:
- Overview of Debugging in Coherence
Coherence applications are typically developed on a single computer. A cache server and application are started within an IDE and the application is debugged as required. - Configuring Logging
Coherence has its own logging framework and also supports the use of Log4j2, SLF4J, and Java logging to provide a common logging environment for an application. - Performing Remote Debugging
Java Debug Wire Protocol (JDWP) provides the ability to debug a JVM remotely. Most IDE tools support JDWP and are used to connect to a remote JVM that has remote debugging enabled. See your IDE's documentation for instructions on how to connect to a remote JVM. - Distributed Tracing
Coherence can use OpenTelemetry or OpenTracing APIs to give developers visibility into the cache operations within the cluster. - Troubleshooting Coherence-Based Applications
Troubleshooting Coherence-based applications is, for the most part, no different than troubleshooting other Java applications.
Parent topic: Getting Started
Overview of Debugging in Coherence
Ideally, most errors can be detected during development using logging, enabling JVM debug options, and capturing thread and heap dumps as required. Moreover, IDEs and profiling tools, such as Oracle's VisualVM and JConsole, provide features for diagnosing problems. However, Coherence applications must eventually be tested in a more distributed environment. Debugging and troubleshooting in the testing environment is more difficult since data and processes are fully distributed across the cluster and because the network affects the application. Remote debugging with Java Debug Wire Protocol (JDWP) together with Coherence's JMX management and reporting capabilities facilitates debugging and troubleshooting in a distributed environment.
Using Oracle Support
My Oracle Support can help debug issues. When sending support an issue, always include the following items in a compressed file:
-
application code
-
configuration files
-
log files for all cluster members
-
Thread and heap dumps are required under certain circumstances. Thread dumps should be sent if the application is running slow and/or appears to be hung. Heap dumps should be sent if the application runs out of memory or is consuming more memory than expected.
Parent topic: Debugging in Coherence
Configuring Logging
This section includes the following topics:
- Changing the Log Level
- Changing the Log Destination
- Sending Log Messages to a File
- Changing the Log Message Format
- Setting the Logging Character Limit
- Using JDK Logging for Coherence Logs
- Mapping JDK Log Levels with Coherence Log Levels
- Using Log4j2 Logging for Coherence Logs
- Mapping Log4j2 Log Levels with Coherence Log Levels
- Using SLF4J for Coherence Logs
Parent topic: Debugging in Coherence
Changing the Log Level
The logger's log level determines which log messages are emitted. The default log level emits error, warning, informational, and some debug messages. During development, the log level should be raised to its maximum setting to ensure all debug messages are logged. The following log levels are available:
-
0
– This level includes messages that are not associated with a logging level. -
1
– This level includes the previous level's messages plus error messages. -
2
– This level includes the previous levels' messages plus warning messages. -
3
– This level includes the previous levels' messages plus informational messages. -
4-9
– These levels include the previous levels' messages plus internal debugging messages. More log messages are emitted as the log level is increased. The default log level is5
. -
-1
– No log messages are emitted.
To change the log level, edit the operational override file and add a <severity-level>
element, within the <logging-config>
element, that includes the level number. For example:
... <logging-config> ... <severity-level system-property="coherence.log.level">9 </severity-level> ... </logging-config> ...
The coherence.log.level
system property can be used to specify the log level instead of using the operational override file. For example:
-Dcoherence.log.level=9
Parent topic: Configuring Logging
Changing the Log Destination
The logger can be configured to emit log messages to several destinations. For standard output to the console, both stdout
and stderr
(the default) can be used. The logger can also emit messages to a specified file.
Coherence also supports the use of JDK, Log4j2, and SLF4J to allow an application and Coherence to share a common logging framework. See Using JDK Logging for Coherence Logs, Using Log4j2 Logging for Coherence Logs, and Using SLF4J for Coherence Logs, respectively.
To change the log destination, edit the operational override file and add a <destination>
element, within the <logging-config>
element, that includes the destination. For example:
... <logging-config> <destination system-property="coherence.log">stdout</destination> ... </logging-config> ...
The coherence.log
system property can be used to specify the log destination instead of using the operational override file. For example:
-Dcoherence.log=stdout
Parent topic: Configuring Logging
Sending Log Messages to a File
The logger can be configured to emit log messages to a file by providing a path and file name in the <destination>
element. The specified path must already exist. Make sure the specified directory can be accessed and has write permissions. Output is appended to the file and there is no size limit. Processes cannot share a log file and the log file is replaced when a process is restarted. Sending log messages to a file is typically used during development and testing and is useful if the log messages need to be sent to Oracle support.
The following example demonstrates specifying a log file named coherence.log
that is written to the /tmp
directory:
... <logging-config> <destination system-property="coherence.log">/tmp/coherence.log </destination> ... </logging-config> ...
Parent topic: Configuring Logging
Changing the Log Message Format
The default format of log messages can be changed depending on the amount of detail that is required. A log message can include static text as well as any of the following parameters that are replaced at run time.
Note:
Changing the log message format must be done with caution as critical information (such as member or thread) can be lost which makes issues harder to debug.
Parameter | Description |
---|---|
|
This parameter shows the date/time (to a millisecond) when the message was logged. |
|
This parameter shows the amount of time that the cluster members has been operational. |
|
This parameter shows the product name and license type. |
|
This parameter shows Coherence version and build details. |
|
This parameter shows the logging severity level of the message. |
|
This parameter shows the thread name that logged the message. |
|
This parameter shows the cluster member id (if the cluster is currently running). |
|
This parameter shows the fully cluster member identification: cluster-name, site-name, rack-name, machine-name, process-name and member-name (if the cluster is currently running). |
|
This parameter shows the specified role of the cluster member. |
|
This parameter shows the text of the message. |
|
This parameter shows the Execution Context ID (ECID). The ECID is a globally unique ID that is attached to requests between Oracle components. The ECID is an Oracle-specific diagnostic feature and is used to correlate log messages across log files from Oracle components and products and is also used to track log messages pertaining to the same request within a single component when multiple requests are processed in parallel. Coherence clients that want to include the ECID in their logs must have an activated Dynamic Monitoring Service (DMS) execution context when invoking Coherence. Note: If JDK logging is used with an Oracle Diagnostic Logging (ODL) handler, then the |
To change the log message format, edit the operational override file and add a <message-format>
element, within the <logging-config>
element, that includes the format. For example:
... <logging-config> ... <message-format>[{date}] <{level}> (thread={thread}) -->{text} </message-format> ... </logging-config> ...
Parent topic: Configuring Logging
Setting the Logging Character Limit
The logging character limit specifies the maximum number of characters that the logger daemon processes from the message queue before discarding all remaining messages in the queue. The messages that are discarded are summarized by the logging system with a single log entry that details the number of messages that were discarded and their total size. For example:
Asynchronous logging character limit exceeded; discarding 5 log messages (lines=14, chars=968)
The truncation is only temporary; when the queue is processed (emptied), the logger is reset so that subsequent messages are logged.
Note:
The message that caused the total number of characters to exceed the maximum is never truncated.
The character limit is used to avoid situations where logging prevents recovery from a failing condition. For example, logging can increase already tight timings, which causes additional failures, which produces more logging. This cycle may continue until recovery is not possible. A limit on logging prevents the cycle from occurring.
To set the log character limit, edit the operational override file and add a <character-limit>
element, within the <logging-config>
element. The character limit is entered as 0
(Integer.MAX_VALUE
) or a positive integer. For example:
... <logging-config> ... <character-limit system-property="coherence.log.limit">12288 </character-limit> </logging-config> ...
The coherence.log.limit
system property can be used to specify the log character limit instead of using the operational override file. For example:
-Dcoherence.log.limit=12288
Parent topic: Configuring Logging
Using JDK Logging for Coherence Logs
Applications that use the JDK logging framework can configure Coherence to use JDK logging as well. Detailed information about JDK logging is beyond the scope of this documentation. For details on JDK logging, see Java Logging Overview in Java SE Core Libraries.
To use JDK logging for Coherence logs:
Parent topic: Configuring Logging
Mapping JDK Log Levels with Coherence Log Levels
Table 5-1 provides a mapping of how JDK log levels are mapped to Coherence log levels.
Table 5-1 Mapping JDK Log Levels
JDK Log Level | Coherence Log Level |
---|---|
OFF |
NONE |
FINEST |
INTERNAL |
SEVERE |
ERROR |
WARNING |
WARNING |
INFO |
INFO |
FINE |
LEVEL_D4 |
FINER |
LEVEL_D5 |
FINEST |
LEVEL_D6 |
FINEST |
LEVEL_D7 |
FINEST |
LEVEL_D8 |
FINEST |
LEVEL_D9 |
ALL |
ALL |
Parent topic: Configuring Logging
Using Log4j2 Logging for Coherence Logs
Applications that use the Log4j2 logging framework can configure Coherence to use
Log4j2 logging as well. Detailed information about Log4j2 logging is beyond the scope of
this documentation. For details on Log4j2 logging, see http://logging.apache.org/log4j/2.x/manual/index.html
.
To use Log4j2 logging for Coherence logs:
Parent topic: Configuring Logging
Mapping Log4j2 Log Levels with Coherence Log Levels
Table 5-2 provides a mapping of how Log4j2 log levels are mapped to Coherence log levels.
Table 5-2 Mapping Log4j2 Log Levels
Log4j2 Log Level | Coherence Log Level |
---|---|
OFF |
NONE |
DEBUG |
INTERNAL |
ERROR |
ERROR |
WARN |
WARNING |
INFO |
INFO |
DEBUG |
LEVEL_D4 |
DEBUG |
LEVEL_D5 |
DEBUG |
LEVEL_D6 |
DEBUG |
LEVEL_D7 |
DEBUG |
LEVEL_D8 |
DEBUG |
LEVEL_D9 |
ALL |
ALL |
Parent topic: Configuring Logging
Using SLF4J for Coherence Logs
Applications that use SLF4J logging can configure Coherence to use SLF4J logging as well. Detailed information about SLF4J logging is beyond the scope of this documentation. For details on SLF4J logging, see http://www.slf4j.org/
.
To use SLF4J logging:
Parent topic: Configuring Logging
Performing Remote Debugging
Java Debug Wire Protocol (JDWP) provides the ability to debug a JVM remotely. Most IDE tools support JDWP and are used to connect to a remote JVM that has remote debugging enabled. See your IDE's documentation for instructions on how to connect to a remote JVM.
To enable remote debugging on a cache server, start the cache server with the following JVM options. Once the cache server has been started, use the IDE's debugger to connect to the JVM using the port specified (5005
in the example).
-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005
Remote debugging a Coherence application can be difficult when the application is no longer on a single node cluster because data is distributed across the members of the cluster. For example, when performing parallel grid operations, the operations are performed on the cluster members where the data is located. Since there are no guarantees on which members holds which data, it is best to constrain a test to use a singe cache server.
In addition, the guardian and packet timeout can make cluster debugging difficult. If the debugger pauses the packet publishing, cluster, and service threads, it will cause disruptions across the cluster. In such scenarios, disable the guardian and increase the packet timeout during the debugging session. See service-guardian.
Parent topic: Debugging in Coherence
Distributed Tracing
Coherence does not include any tracing implementation libraries. Therefore, the developer needs to provide the desired tracing runtime. Since OpenTracing is no longer maintained, Oracle recommends that you use OpenTelemetry. A minimum version of OpenTelemetry for Java version 1.29 is recommended. Even though OpenTracing is deprecated in Coherence, it is still a supported option when using the latest OpenTracing 0.33.0.
If you are using OpenTracing, and want Coherence to initialize the tracing runtime, include OpenTracing’s TracerResolver in your project’s classpath.
If you are using OpenTelemetry, and want Coherence to initialize the tracing runtime, include OpenTelemetry’s SDK Autoconfigure in your project’s classpath.
In either case, if these dependencies are not satisfied, it is assumed that the configuration of the tracing runtime is being managed outside of Coherence. Therefore, Coherence will take no action to initialize the runtime itself.
If using coherence-grpc-proxy
or coherence-java-client
,
you may opt to trace the gRPC calls made by these libraries by including an additional
dependency to your project. For OpenTracing, include OpenTracing gRPC Instrumentation and for OpenTelemetry,
include Library Instrumentation for gRPC 1.6.0+.
This section includes the following topics:
Parent topic: Debugging in Coherence
Configuring Tracing
To configure tracing, edit the operational override
tangosol-coherence-override.xml
file and add a
<tracing-config>
element with a child
<sampling-ratio>
element.
For example:
... <tracing-config> <sampling-ratio>0</sampling-ratio> <!-- user-initiated tracing --> </tracing-config> ...
Tracing operates in three modes:
- -1 - This value disables tracing.
- 0 - This value enables user-initiated tracing. This means that Coherence will not initiate tracing on its own and the application should start an outer tracing span, from which Coherence will collect the inner tracing spans. If the outer tracing span is not started, the tracing activity will not be performed.
- 0.01-1.0 - This range indicates the tracing span being collected. For example, a value of 1.0 will result in all spans being collected, while a value of 0.1 will result in roughly 1 out of every 10 spans being collected.
The coherence.tracing.ratio
system property can be used to specify the tracing sampling ratio instead of using the operational override file. For example:
-Dcoherence.tracing.ratio=0
Parent topic: Distributed Tracing
Traced Operations in Coherence
- All operations exposed by the
NamedCache
API when using partitioned caches. - Events processed by event listeners (such as
EventInterceptor
orMapListener)
. - Persistence operations.
- CacheStore operations.
- Coherence gRPC calls (both client and server).
Parent topic: Distributed Tracing
User-initiated Tracing
When the sampling ratio is set to zero, the application will be required to start a tracing span prior to invoking a Coherence operation.
For example:
Tracer tracer = GlobalOpenTelemetry.getTracer(“your-tracer");
Span span = tracer.spanBuilder("test").startSpan();
NamedCache cache = CacheFactory.getCache("some-cache")
try (Scope scope = span.makeCurrent())
{
cache.put("a", "b");
cache.get("a");
}
finally
{
span.end();
}
Note:
When user-initiated tracing is enabled, no tracing spans will be captured unless the application starts an active outer tracing span.Parent topic: Distributed Tracing
Troubleshooting Coherence-Based Applications
Troubleshooting Coherence-based applications is, for the most part, no different than troubleshooting other Java applications.
Most IDEs provide features that facilitate the process. In addition, many tools, such as: VisualVM, JConsole, and third-party tools provide easy ways to monitor and troubleshoot Java applications. See Prepare Java for Troubleshooting in Java Platform, Standard Edition Troubleshooting Guide.
Troubleshooting a Coherence application on a single server cluster is typically straightforward. Most Coherence development work is done in such an environment because it facilitates debugging. Troubleshooting an application that is deployed on a distributed cluster can become more challenging.
This section includes the following topics:
- Using Coherence Logs
- Using JMX Management and Coherence Reports
- Using JVM Options to Help Debug
- Using Distributed Tracing
- Capturing Thread Dumps
- Capturing Heap Dumps
- Monitoring the Operating System
Parent topic: Debugging in Coherence
Using Coherence Logs
Log messages provide information that is used to monitor and troubleshoot Coherence. See Log Message Glossary in Administering Oracle Coherence. The glossary provides additional details as well as specific actions that can be taken when a message is encountered.
Configuring logging beyond the default out-of-box configuration is very important when developing and debugging an application. Specifically, use the highest log level (level 9 or ALL when using JDK or Log4j2 logging) to ensure that all log messages are emitted. Also, consider using either JDK or Log4j2 logging. Both of these frameworks support the use of rolling files and console output simultaneously. Lastly, consider placing all log files in a common directory. A common directory makes it easier to review the log files and package them for the Coherence support team. See Configuring Logging.
Parent topic: Troubleshooting Coherence-Based Applications
Using JMX Management and Coherence Reports
Coherence management is implemented using Java Management Extensions (JMX). Many MBeans are provided that detail the health and stability of Coherence. The MBeans provide valuable insight and should always be used when moving an application from a development environment to a fully distributed environment. MBeans are accessible using JConsole and VisualVM or any management tool that supports JMX. In addition, Coherence includes reports that gather information from the MBeans over time and provide a historical context that is not possible simply by monitoring the MBeans. The reports are most often used to identify trends that are valuable for troubleshooting. Management and reporting are not enabled by default and must be enabled. See Configuring JMX Management and Enabling Oracle Coherence Reporting on a Cluster Member in Managing Oracle Coherence.
Parent topic: Troubleshooting Coherence-Based Applications
Using JVM Options to Help Debug
Most JVMs include options that facilitate debugging and troubleshooting. These options should be used to get as much information as possible. Consult your JVM vendor's documentation for their available options. The JVM options discussed in this section are Java HotSpot specific. See the Java HotSpot VM Options Web page.
The following JVM options (standard and non standard) can help when debugging and troubleshooting applications:
-
-verbose:gc
or-Xloggc:
file
– These options are used to enable additional logs for each garbage collection event. In a distributed system, a GC pause on a single JVM can affect the performance of many JVMs, so it is essential to monitor garbage collection very closely. The-Xloggc
option is similar to verbose GC but includes timestamps. -
-Xprof
and-Xrunhprof
– These options are used to view JVM profile data and are not intended for production systems. -
-XX:-PrintGC
,-XX:-PrintGCDetails
, and-XX:-PrintGCTimeStamps
– These options are also used print messages at garbage collection. -
-XX:-HeapDumpOnOutOfMemoryError
and-XX:HeapDumpPath=./java_pid
<pid>
.hprof
– These options are used to initiate a heap dump when ajava.lang.OutOfMemoryError
is thrown. -
-XX:ErrorFile=./hs_err_pid
<pid>
.log
– This option saves error data to a file.
Parent topic: Troubleshooting Coherence-Based Applications
Using Distributed Tracing
Distributed tracing allows developers to profile and monitor applications. This is particularly useful in a clustered environment such as Coherence. See Distributed Tracing.
Parent topic: Troubleshooting Coherence-Based Applications
Capturing Thread Dumps
Thread dumps are used to see detailed thread information, such as thread state, for each thread in the JVM. A thread dump also includes information on each deadlocked thread (if applicable). Thread dumps are useful because of Coherence's multi-threaded and distributed architecture. Thread dumps are often used to troubleshoot an application that is operating slowly or is deadlocked. Make sure to always collect several dumps over a period of time since a thread dump is only snapshot in time. Always include a set of thread dumps when submitting a support issue.
Coherence provides a native logClusterState
JMX operation that is located on the ClusterMBean
MBean and a native logNodeState
JMX operation that is located on the ClusterNodeMBean
MBean. These operations initiate a thread dump (including outstanding polls) on multiple cluster members or on a single cluster member, respectively. See ClusterMBean and ClusterNodeMBean in Managing Oracle Coherence.
To perform a thread dump locally on Unix or Linux operating systems, press Ctrl+\
at the application console. To perform a thread dump on Windows, press Ctrl+Break
(or Pause
). Both methods include a heap summary with the thread dump.
Most IDEs provide a thread dump feature that can be used to capture thread dumps while working in the IDE. In addition, Unix and Linux operating systems can use the kill -3 pid
to cause a remote thread dump in the IDE. On Windows, use a third party tool (such as SendSignal) to send a ctrl+break
signal to a remote Java process and cause a dump in the IDE.
Profiling tools, such as Oracle's VisualVM (visualvm
) and JConsole
(jconsole
) are able to perform thread dumps. These tools are very
useful because they provide a single tool for troubleshooting and debugging and display
many different types of information in addition to just thread details.
Lastly, the jstack
tool can be used to capture a thread dump for any process. For example, use jps
to find a Java process ID and then execute the following from the command line:
jstack <pid>
The jstack
tool is unsupported and may or may not be available in future versions of the JDK.
Note:
By default, the thread dumps generated by Coherence, either through theClusterMBean
MBean or the service
guardian, do not include lock and deadlock analysis. Thread lock and deadlock analysis
may have a significant performance impact. To include the analysis for troubleshooting
purposes, you can set the system property com.oracle.coherence.common.util.Threads.dumpLocks to
true.
The following WLST script can be used to trigger cluster wide thread dumps when running Coherence in a WebLogic domain:
# connect to domain runtime mbean server
connect(adminUser, adminPasswd, "t3://%s:%s" % (adminHost, adminPort))
domainRuntime()
# obtain coherece cluster mbean
cohClusterON=ObjectName("Coherence:type=Cluster,cluster=%s" % cohClusterName)
cohClusterMBean=list(mbs.queryMBeans(cohClusterON, None))
# obtain all mbean of coherence nodes associated with this cluster
cohClusterNodes=list(mbs.queryMBeans(ObjectName("Coherence:type=Node,cluster=%s,*" % cohClusterName), None))
# take a thread dump on each coherence member
types=["java.lang.String"]
for node in cohClusterNodes:
roleName=mbs.getAttribute(node.getObjectName(), "RoleName")
mbs.invoke(cohClusterMBean[0].getObjectName(), 'logClusterState', [roleName], types)
Parent topic: Troubleshooting Coherence-Based Applications
Capturing Heap Dumps
Heap dumps are used to see detailed information for all the objects in a JVM heap. The information includes how many instances of an object are loaded and how much memory is allocated to the objects. Heap information is typically used to find parts of an application that may potentially be wasting resources and causing poor performance. In a fully distributed Coherence environment, heap dumps can be tricky because application processing is occurring across the cluster and problematic objects may not necessarily be local to a JVM. Make sure to always collect several dumps over a period of time since a heap dump is only a snapshot in time. Always include heap dumps when submitting a support issue.
The easiest way to capture a heap dump is to use a profiling tool. Oracle's VisualVM
(visualvm
) and JConsole (jconsole
) provide heap
dump features. In addition, most IDEs provide a heap dump feature that can be used to
capture heap dumps while working in the IDE.
As an alternative, the jmap
tool can be used to capture heap dumps, and the jhat
tool can be used to view heap dumps. For example, use jps
to find a Java process ID and then execute the following from the command line:
jmap -dump:format=b,file=/coherence.bin pid
To view the heap dump in a browser, execute the following from the command line and then browse to the returned address. The file can also be loaded into VisualVM for viewing.
jhat /coherence.bin
The jmap
and jhat
tools are unsupported and may or may not be available in future versions of the JDK.
Parent topic: Troubleshooting Coherence-Based Applications
Monitoring the Operating System
Always monitor a cluster member's operating system when troubleshooting and debugging Coherence-based applications. Poorly tuned operating systems can affect the overall performance of the cluster and may have adverse effects on an application. See Operating System Tuning in Administering Oracle Coherence.
In particular, the following areas are important to monitor:
-
CPU – Is the processor running at 100% for extended periods of time?
-
Memory/Swapping – Is the available RAM memory being exhausted and causing swap space to be used?
-
Network – Is buffer size, the datagram size, and the Maximum Transmission Unit (MTU) size affecting performance and success rates?
To monitor the overall health of the operating system, use tools such as vmstat
and top
for Unix/Linux; for Windows, use perfmon
and tools available from Windows Sysinternals (for example procexp
and procmon
). See Performing a Network Performance Test in Administering Oracle Coherence.
Parent topic: Troubleshooting Coherence-Based Applications