Clustering and In-Memory Caching: Rationale and Design for Changing Models

As of version 4.5, Jive includes a new framework for in-memory caching. The new framework replaces the in-memory cache provided in releases prior to 4.5.0 by Oracle Coherence. Coherence is no longer in the application. The new caching subsystem operates as an external service with simpler administration and increased stability and flexibility compared to previous releases.

As part of this change, the clustering and caching features were separated into two distinct frameworks. Previously, Coherence had handled both features. This topic describes changes to both features.

The changes described here involve design and behavior concepts that are described more deeply in In-Memory Caching Overview.

FAQ: In-Memory Caching and Clustering Changes

This FAQ answers questions about the changes in both design and practical terms.

What happened?

In versions prior to 4.5, the application's in-memory caching and clustering features were based on Oracle Coherence. As of version 4.5, the two features no longer share an implementation framework; both were reimplemented from scratch and Coherence was removed. Generally speaking, these changes were made to help ensure that the application continues to perform well under heavy load.

What was wrong with the previous model?

While the previous model for clustering and caching worked reasonably well for many releases, the application had begun to outgrow the model in many of its deployments. Here are the main reasons:

What changed?

See Changes in Clustered Installations below for a detailed list of design characteristics and changes.

Will clustering and caching be changed in previous versions?

There are no plans to change the clustering and caching features in versions prior to 4.5.

How are caching and clustering related now?t

The features aren't two aspects of the same framework (as they were prior to version 4.5) -- they're separate now, but interoperate. Although the parts of the caching system are not aware of their presence in a cluster, the clustering system is aware of the caches. For example, when changes occur to data in one node's near cache, the clustering system is responsible for ensuring that the other nodes are aware of the change.

Changes in Single-Maching Installations

Changes in Clustered Installations

The items listed here describe aspects of the new caching model in a clustered context. Many of these characteristics directly differ from the previous model.

General Design Changes

  • Near cache changes propagated. In the new model, when a change occurs on one node -- such as a user change that results in a "put" to the near cache -- the change is propagated to near caches on other nodes. The near cache that received the change notifies the cluster management system that a change has occurred. The clustering system sends batched changes to other nodes in the cluster. Once aware that a change to data they care about has occurred, other nodes know to go to the cache server for items in the changed cache rather than using their near cache.
  • Near caches synchronized every 1/2 second. The near cache instances are synchronized across the cluster every 1/2 second. In other words, an update from node A might not be seen on node B for up to 0.5 seconds.
  • Near cache replaces node affinity for immediacy. In the previous model, cache entries that were being often used by a particular node could be copied to that node. In the new model, the cache system relies on a larger near cache for similar functionality.
  • Cache operations are transaction-aware. Cache operations inside a transaction will only be executed once the transaction is complete. If the transaction rolls back, any cache updates or deletes are dropped.
  • Eventual cluster node consistency replaces atomic cache operations. In the new model, the goal is eventual consistency. In the short-term cache implementation, updates can be delayed by half a second. In the previous model, cache operations were atomic, meaning they were consistent across the cluster.
  • Distributed cache with larger near cache. The new model does not support optimistic replication, in which all data is replicated on all nodes. Instead, it supports distributed caching with a remote cache server and a much larger near cache than the previous model.
  • Values deserialized to near cache. In the new model, cache values are deserialized when returned from a remote cache server and stored in their deserialized form in the near cache. Local caches involve no object serialization.

Changes That Can Affect Requirements and Installation

  • Separate cache server instead of caches distributed among nodes. Perhaps the most important design difference between the previous caching model and the new one is that in previous model, each app node in a cluster had its own cache and those caches were replicated across nodes. In the new model, application servers on their nodes are clients of the cache server, which is on its own node. This means that the failure of a node is less likely to cause a cluster-wide failure; cache requests go to the cache server rather than to another node in the cluster.
  • Cache server installed through the RPM. In the new model, the cache server feature is distributed in the RPM and comes with its own set of scripts. You set up a cache server in a manner similar to setting up an application server. See Managing Cache Servers for more.
  • More memory needed. In the new model, overall application memory requirements across cluster nodes is higher because it now includes the additional JVMs for the separate cache server. For more information on what's needed for a cache server, see the System Requirements.

Changes That Can Affect Configuration

  • Multicasting replaced with manual configuration for clusters. In the previous clustering model, nodes communicated via User Datagram Protocol (UDP). This protocol enabled multicasting, through which potential cluster members could discover a cluster. In the new model, which uses TCP instead of UDP for communication, cache servers must be manually configured to know about the other cache servers. You also configure the application with the addresses of the cache servers as part of setup. For more on setting up servers, see Managing Cache Servers and Managing Cache Servers, as well as Managing an Application Cluster.
  • Cache sizes are no longer individually set. Cache evictions in the previous model were determined on a per cache basis. In the new system you do not size individual caches; rather, the server handles evictions automatically based on heap usage.
  • Creating a cluster now requires specifying a cache server. In the previous model, caches were spread among all nodes in the cluster. In a clustered configuration with the new model, the caching service now requires a separate cache server (it's not possible to configure the application on a cluster without specifying a cache server).

Changes That Can Affect Performance

  • Timeout for long-running cache operations. In the previous model, each application node had a part of the cache and each of those cache parts ran in its application server's JVM. This presented a problem in a situation where one cluster node failed, leaving and joining the cluster repeatedly. When that happened, it created a problem for both the application and Coherence. In the application, remote nodes would have to wait for Coherence to respond (or time out), while trying to connect to the failing node.

    In the new model, long-running cache operations such as this will time out and return null after 500 ms. In this way an unresponsive cache node won't cause the application to hang. A cluster node's failure affects only the load balancer and clustering technology; if the failing node is the current cluster master, the application will elect a new master node.

  • Memory pressure, rather than time, determines cache item eviction. In the previous model, near caches were much smaller and evicted cache items by timing them out (using with short timeout limits). In the new model, the near cache in each application server's process evicts cache items when heap usage goes over 75 percent.
  • Multiple cache servers preserve cached data. If you're running more than two cache servers, cached data is now stored on more than one cache server. This means that if a node goes down, cache data won't be lost. (Note that this doesn't mean that the data is automatically replicated to another node to maintain the replication factor of the data.)
    CAUTION:
    If you're setting up more than one cache server machine, you must use three or more. The CACHE_ADDRESSES value should list them in a comma-separated list. Using only two cache servers is not supported and can cause data loss.