1 Introduction to Coherence
This chapter includes the following sections:
- Basic Concepts
Learn about Coherence clustering, configuration, caching, data storage, and serialization. - Read/Write Caching
The CoherenceNamedCache
API is the primary interface used by applications to get and interact with cache instances. - Querying the Cache
Coherence provides the ability to query cached data. - Invocation Service
The Coherence invocation service can deploy computational agents to various nodes within the cluster. - Event Programming
Coherence supports two event programming models that allow applications to receive and react to notifications of cluster operations. - Transactions
Coherence includes various transaction options that provide different transaction guarantees. - HTTP Session Management
Coherence*Web is an HTTP session-management module with support for a wide range of application servers. - Object-Relational Mapping Integration
Most ORM products support Coherence as an "L2" caching plug-in. These solutions cache entity data inside Coherence, allowing application on multiple servers to share cached data. - C++/.NET Integration
Coherence provides support for cross-platform clients over TCP/IP. - Management and Monitoring
Coherence offers management and monitoring facilities using Java Management Extensions (JMX). - Using Java Modules to Build a Coherence Application
From Coherence 14c (14.1.1.0.0), most of the Coherence jars are Java modules (with themodule-info.java
file in each jar). To see the module name, characteristics, and dependencies of each jar, use the--describe-module
operation modifier of the jar command.
Parent topic: Getting Started
Basic Concepts
Learn about Coherence clustering, configuration, caching, data storage, and serialization.
This section includes the following topics:
- Clustered Data Management
- A single API for the logical layer, XML configuration for the physical layer
- Caching Strategies
- Data Storage Options
- Serialization Options
- Configurability and Extensibility
- Namespace Hierarchy
Parent topic: Introduction to Coherence
Clustered Data Management
At the core of Coherence is the concept of clustered data management. This implies the following goals:
-
A fully coherent, single system image (SSI)
-
Scalability for both read and write access
-
Fast, transparent failover and failback
-
Linear scalability for storage and processing
-
No Single-Points-of-Failure (SPOFs)
-
Cluster-wide locking and transactions
Built on top of this foundation are the various services that Coherence provides, including database caching, HTTP session management, grid agent invocation and distributed queries. Before going into detail about these features, some basic aspects of Coherence should be discussed.
Parent topic: Basic Concepts
A single API for the logical layer, XML configuration for the physical layer
Coherence supports many topologies for clustered data management. Each of these topologies has a trade-off in terms of performance and fault-tolerance. By using a single API, the choice of topology can be deferred until deployment if desired. This allows developers to work with a consistent logical view of Coherence, while providing flexibility during tuning or as application needs change.
Parent topic: Basic Concepts
Caching Strategies
Coherence provides several cache implementations:
-
Local Cache – Local on-heap caching for non-clustered caching. See Understanding Local Caches.
-
Distributed Cache – True linear scalability for both read and write access. Data is automatically, dynamically and transparently partitioned across nodes. The distribution algorithm minimizes network traffic and avoids service pauses by incrementally shifting data. See Understanding Distributed Caches.
-
Near Cache – Provides the performance of local caching with the scalability of distributed caching. Several different near-cache strategies are available and offer a trade-off between performance and synchronization guarantees. See Understanding Near Caches.
- View Cache – Perfect for small, read-heavy caches. See Understanding View Caches.
In-process caching provides the highest level of raw performance, because objects are managed within the local JVM. This benefit is most directly realized by the Local, View, and Near Cache implementations.
Out-of-process (client/server) caching provides the option of using dedicated cache servers. This can be helpful when you want to partition workloads (to avoid stressing the application servers). This is accomplished by using the Partitioned cache implementation and simply disabling local storage on client nodes through a single command-line option or a one-line entry in the XML configuration.
Tiered caching (using the Near Cache functionality) enables you to couple local caches on the application server with larger, partitioned caches on the cache servers, combining the raw performance of local caching with the scalability of partitioned caching. This is useful for both dedicated cache servers and co-located caching (cache partitions stored within the application server JVMs).
See Using Caches.
Parent topic: Basic Concepts
Data Storage Options
While most customers use on-heap storage combined with dedicated cache servers, Coherence has several options for data storage:
-
On-heap—The fastest option, though it can affect JVM garbage collection times.
-
Journal—A combination of RAM storage and disk storage, optimized for solid state disks, that uses a journaling technique. Journal-based storage requires serialization/deserialization.
-
File-based—Uses a Berkeley Database JE storage system.
Coherence storage is transient: the disk-based storage options are for managing cached data only. For persistent storage, Coherence offers backing maps coupled with a CacheLoader/CacheStore.
Parent topic: Basic Concepts
Serialization Options
Because serialization is often the most expensive part of clustered data management, Coherence provides the following options for serializing/deserializing data:
-
com.tangosol.io.pof.PofSerializer
– The Portable Object Format (also referred to as POF) is a language agnostic binary format. POF was designed to be incredibly efficient in both space and time and is the recommended serialization option in Coherence. See Using Portable Object Format. -
java.io.Serializable
– The simplest, but slowest option. -
java.io.Externalizable
– This requires developers to implement serialization manually, but can provide significant performance benefits. Compared tojava.io.Serializable
, this can cut serialized data size by a factor of two or more (especially helpful with Distributed caches, as they generally cache data in serialized form). Most importantly, CPU usage is dramatically reduced. -
com.tangosol.io.ExternalizableLite
– This is very similar tojava.io.Externalizable
, but offers better performance and less memory usage by using a more efficient IO stream implementation. -
com.tangosol.run.xml.XmlBean
– A default implementation ofExternalizableLite
.
Parent topic: Basic Concepts
Configurability and Extensibility
Coherence's API provides access to all Coherence functionality. The most commonly used subset of this API is exposed through simple XML options to minimize effort for typical use cases. There is no penalty for mixing direct configuration through the API with the easier XML configuration.
Coherence is designed to allow the replacement of its modules as needed. For example, the local "backing maps" (which provide the actual physical data storage on each node) can be easily replaced as needed. The vast majority of the time, this is not required, but it is there for the situations that require it. The general guideline is that 80% of tasks are easy, and the remaining 20% of tasks (the special cases) require a little more effort, but certainly can be done without significant hardship.
Parent topic: Basic Concepts
Namespace Hierarchy
Coherence is organized as set of services. At the root is the Cluster service. A cluster is defined as a set of Coherence instances (one instance per JVM, with one or more JVMs on each computer). See Introduction to Coherence Clusters. Under the cluster service are the various services that comprise the Coherence API. These include the various caching services (Distributed, Federated, and so on) and the Invocation Service (for deploying agents to various nodes of the cluster). Each instance of a service is named, and there is typically a default service instance for each type. The cache services contain named caches (com.tangosol.net.NamedCache
), which are analogous to database tables—that is, they typically contain a set of related objects.
Parent topic: Basic Concepts
Read/Write Caching
NamedCache
API is the primary interface used by applications to get and interact with cache instances.This section includes the following topics:
Parent topic: Introduction to Coherence
NamedCache
The following source code returns a reference to a NamedCache
instance. The underlying cache service is started if necessary.
import com.tangosol.net.*; ... NamedCache cache = CacheFactory.getCache("MyCache");
Coherence scans the cache configuration XML file for a name mapping for MyCache
. This is similar to Servlet name mapping in a web container's web.xml
file. Coherence's cache configuration file contains (in the simplest case) a set of mappings (from cache name to cache scheme) and a set of cache schemes.
By default, Coherence uses the coherence-cache-config.xml
file found at the root of coherence.jar
. This can be overridden on the JVM command-line with -Dcoherence.cacheconfig=file.xml
. This argument can reference either a file system path, or a Java resource path.
The com.tangosol.net.NamedCache
interface extends several other interfaces:
-
java.util.Map
—basicMap
methods such asget()
,put()
,remove()
. -
com.tangosol.net.cache.CacheMap
—methods for getting a collection of keys (as aMap
) that are in the cache and for putting objects in the cache. Also supports adding an expiry value when putting an entry in a cache. -
com.tangosol.util.QueryMap
—methods for querying the cache. See Querying Data in a Cache. -
com.tangosol.util.InvocableMap
—methods for server-side processing of cache data. See Processing Data In a Cache. -
com.tangosol.util.ObservableMap
—methods for listening to cache events. See Using Map Events. -
com.tangosol.util.ConcurrentMap
—methods for concurrent access such aslock()
andunlock()
. See Performing Transactions.
NamedCache
, a single thread must ensure
that:
- a cluster is running
- a service is running
- a cache is created
Coherence was originally designed to perform these tasks using
synchronized()
blocks (intrinsic java locks) because multiple
threads can perform the above simultaneously.
Because intrinsic locks are non-interruptible, this approach could lead to application threads blocking indefinitely, waiting for the lock when, for example, a network outage occurs.
Interruptible locks provide the ability to do a try-lock that times out. This allows an application to not be indefinitely blocked. With this improvement, you can now start a cluster or cache within a try-with-resources block using a thread local timeout. See Class Timeout.
try (Timeout t = Timeout.after(20, TimeUnit.SECONDS))
{
CacheFactory.getCache("dist"); // or CacheFactory.ensureCluster();
}
catch (Exception e)
{
// doSomething else.
}
The default lock timeout is Long.MAX_VALUE
. If the operation
(getCache
, ensureCluster
, and so on) does not
return within the specified timeout, an InterruptedException
is
displayed. If the corresponding locks cannot be acquired within the specified timeout, a
RequestTimeoutException
is displayed. The service request timeout
is also honored for the service lock, if specified. For example, when references to
caches are maintained and member restarts occur, underlying resources must be obtained
again on the next call performed on a NamedCache
reference. In this
case, a configured request timeout will prevent blocking indefinitely.
If both request timeout and thread local timeout are specified, the latter takes precedence.
Parent topic: Read/Write Caching
NamedCache Usage Patterns
There are two general approaches to using a NamedCache
:
-
As a clustered implementation of
java.util.Map
with several added features (queries, concurrency), but with no persistent backing (a "side" cache). -
As a means of decoupling access to external data sources (an "inline" cache). In this case, the application uses the
NamedCache
interface, and theNamedCache
takes care of managing the underlying database (or other resource).
Typically, an inline cache is used to cache data from:
-
a database—The most intuitive use of a cache—simply caching database tables (in the form of Java objects).
-
a service—Mainframe, web service, service bureau—any service that represents an expensive resource to access (either due to computational cost or actual access fees).
-
calculations—Financial calculations, aggregations, data transformations. Using an inline cache makes it very easy to avoid duplicating calculations. If the calculation is complete, the result is simply pulled from the cache. Since any serializable object can be used as a cache key, it is a simple matter to use an object containing calculation parameters as the cache key.
See Caching Data Sources.
Write-back options:
-
write-through—Ensures that the external data source always contains up-to-date information. Used when data must be persisted immediately, or when sharing a data source with other applications.
-
write-behind—Provides better performance by caching writes to the external data source. Not only can writes be buffered to even out the load on the data source, but multiple writes can be combined, further reducing I/O. The trade-off is that data is not immediately persisted to disk; however, it is immediately distributed across the cluster, so the data survives the loss of a server. Furthermore, if the entire data set is cached, this option means that the application can survive a complete failure of the data source temporarily as both cache reads and writes do not require synchronous access the data source.
Parent topic: Read/Write Caching
Querying the Cache
NamedCache
instance, all objects should implement a common interface (or base class). Any field of an object can be queried; indexes are optional, and used to increase performance. See Querying Data in a Cache.
To add an index to a NamedCache
, you first need a value extractor (which accepts as input a value object and returns an attribute of that object). Indexes can be added blindly (duplicate indexes are ignored). Indexes can be added at any time, before or after inserting data into the cache.
It should be noted that queries apply only to cached data. For this reason, queries should not be used unless the entire data set has been loaded into the cache, unless additional support is added to manage partially loaded sets.
Developers have the option of implementing additional custom filters for queries, thus taking advantage of query parallel behavior. For particularly performance-sensitive queries, developers may implement index-aware filters, which can access Coherence's internal indexing structures.
Coherence includes a built-in optimizer, and applies indexes in the optimal order. Because of the focused nature of the queries, the optimizer is both effective and efficient. No maintenance is required.
Parent topic: Introduction to Coherence
Invocation Service
The invocation service is accessed through the InvocationService
interface and includes the following two methods:
public void execute(Invocable task, Set setMembers, InvocationObserver observer); public Map query(Invocable task, Set setMembers);
An instance of the service can be retrieved from the CacheFactory
class.
Coherence implements the WorkManager
API for task-centric processing.
Parent topic: Introduction to Coherence
Event Programming
The event programming models are:
-
Live Events – The live event programming model uses user-defined event interceptors that are registered to receive different types of events. Applications decide what action to take based on the event type. Many events that are available through the use of map events are also supported using live events. See Using Live Events.
-
Map Events – The map event programming model uses user-defined map listeners that are attached to the underlying map implementation. Map events offer customizable server-based filters and lightweight events that can minimize network traffic and processing. Map listeners follow the JavaBean paradigm and can distinguish between system cache events (for example, eviction) and application cache events (for example, get/put operations). See Using Map Events.
Parent topic: Introduction to Coherence
Transactions
Coherence includes various transaction options that provide different transaction guarantees.
Coherence transaction options include: basic data concurrency using the ConcurrentMap
interface and EntryProcessor
API, partition-level transactions using implicit locking and the EntryProcessor
API, atomic transactions using the Transaction Framework API, and atomic transactions with full XA support using the Coherence resource adapter. See Performing Transactions.
Parent topic: Introduction to Coherence
HTTP Session Management
Using Coherence session management does not require any changes to the application. Coherence*Web uses near caching to provide fully fault-tolerant caching, with almost unlimited scalability (to several hundred cluster nodes without issue).
Parent topic: Introduction to Coherence
Object-Relational Mapping Integration
Parent topic: Introduction to Coherence
C++/.NET Integration
Parent topic: Introduction to Coherence
Management and Monitoring
Parent topic: Introduction to Coherence
Using Java Modules to Build a Coherence Application
From Coherence 14c (14.1.1.0.0), most of the Coherence jars are Java modules (with the module-info.java
file in each jar). To see the module name, characteristics, and dependencies of each jar, use the --describe-module
operation modifier of the jar command.
jar --file=coherence.jar --describe-module
Note:
The following Coherence jars are not Java modules:coherence-http-grizzly.jar
coherence-http-jetty.jar
coherence-http-netty.jar
coherence-http-simple.jar
coherence-web.jar
Each of the Coherence modules is open, which grants reflective access to all of its packages to other modules. However, Coherence may require you to open or export modules it depends on or explicitly add transitive dependencies.
For instance, when using lambdas in a distributed environment, as described
in About Lambdas in a Distributed Environment, you need to open any application module(s) containing distributed
lambda(s) to module com.oracle.coherence
. This allows for resolving the
distributed lambda to the application's lambda(s) during deserialization.
In general, the java.lang.IllegalAccessError or java.lang.IllegalAccessException error message provides descriptive information about the module you require to open or add exports to.
Parent topic: Introduction to Coherence