While you can have a successful career with Coherence just being a get/set man, its true power is realized when you leverage the full scale of the cluster as a whole and exploit its distributed processing capabilities.
For the use cases I’ve implemented, the features I used most frequently and have gone head-to-head with incumbents are as follows:
- InvocationService - I have to admit that I took this for granted up until I went against IBM’s eXtreme Scale. Most organizations want to preload/warm the cache and the InvocationService allows you to issue commands to each member in a distributed manner. Parallelizing this activity gains economies of scale since the load time and rebalancing can be kept to a minimum. Said another way, a million rows can be loaded in the time it takes to load 100,000 if you have 10 storage enabled members. Each member is issued a command which details what rows it is responsible for loading. Coherence provides a number of the libraries required to handle this including ‘retry’ functionality hooks and abstracts all the threading/concurrency logic which would be a nightmare to sort out; as IBM learned on this project. This is in direct contrast to extremeScale’s capability - which relied on leveraging Java’s Executor classes. Basically, they had to roll their own distributed processing engine while on-site.
- Filter, Aggregators and EntryProcessors - Before MapReduce & Hadoop came on the scene in such force Coherence had equivalent functionality that was much easier to use. Filters provide the ability to use conditional boolean logic against your data Out-Of-the-Box. Many fail to realize how powerful this is. In the bake off extremeScale had nothing close to this and therefore had to code it. The requirement was to port a StoredProcedure’s logic, which took 30+ secs to run, into something the grid can run. The implementation was based on an EntryProcessor that leveraged Filters and Aggregators. While I would love to say it was strategic coding ability, it wasn’t - I merely used OOB tools. The end result was that the EntryProcessor, running a complex workflow, was a magnitude faster than IBM’s get() call.
- POF - Portable Object Format is the binary optimized proprietary Coherence serialization. It provides staggering Object compaction. For example, an Item object that was 750 bytes with Java serialization is 31 bytes with POF. This has a rippling impact across the entire app, the cluster, even your network since it needs to handle the chatty cluster members.
Improvements to My Organization
The biggest improvement is in speed - everything is faster from record retrieval to workflow processing.
Room for Improvement
Tooling around complex cluster config files so issues can be identified before the cluster is stood up - and subsequently collapses. Cluster management tools that are independent of WebLogic. Dynamic cluster config rollout and rollback. Ideally this would be used in dev as a prod cluster should be locked down. I’d also like to have some sort of GUI (out of the box) that illustrated cluster member vitals; storage, heap, offHeap, watermarks, evictions, etc.
Monitoring and configuration could be easier while support for streaming data windows and the like isn’t available yet. Moreover, native cron(scheduling) capabilities and an Async API would be a nice to have but those challenges can be overcome with 3rd party libraries. Lastly, native security features would alleviate some concerns and workarounds however, I fully understand impact on performance...
Use of Solution
I’ve used Coherence since 2008. I transitioned into consulting where I led a number of projects across several organizations to define, install, and integration clusters for maximum impact on critical business systems.
Customer Service and Technical Support
I have only needed an assist from Oracle once and the issue turned out to be a config problem. The organization had a healthy support agreement and Oracle was able to turned it around quick. Perhaps one of the reasons I haven’t engaged them more is because I jumped into the community early on. I attended every Coherence SIG [Special Interest Group] meeting that I could and became friendly with a few of the developers.
Coherence is very easy to get running locally. Standing up, or defining a cluster for that matter is another task entirely. Each cluster has many ‘knobs’ to dial in. While this offers great flexibility, one should exercise caution when getting into areas of the config that are not understood.
The objective of the project and the performance need to be kept in sight. Here are some questions to help drive the configurations files: Is your project read or write heavy? This will dictate if you should have more smaller nodes vs less larger ones. Should they be storage enabled or not? How much data does the app generally use, would a near cache be beneficial? How often is your reference data used, how much is there determines if it should replicated or not. How many members should there be? Do I need to use a prime number somewhere? Why? Do I need eviction policies, what should they be based on? How do I tell if my cluster is too chatty? How will other apps leverage the cluster? Should I use WKA? Will that prevent new members from joining?
It goes on and on and we didn’t touch DR or monitoring.
I’ve done both and in most cases the projects didn’t have proper momentum until a SME was introduced and the questions above could be addressed. Most folks apply relational thinking towards a cluster and that generally doesn't end well. While you can use rich objects, I’d look for a different model - something flat. Or you need to strictly define your cache strategy to keep hierarchies together (hard to do).
Other Solutions Considered
A side by side POC was done with IBMs eXtreme Scale on a project. I also have experience with Gemfire - and wish I didn’t.
Take the time to learn it and test all assumptions. For example, I was using push replication [PR] to satisfy a client's disaster recovery [DR] requirement. All of a sudden the primary cluster collapsed - ran out of memory despite having high watermarks configured. As it turned out the DR site connection went down and PR calls started to queue. The high watermark calculation did not know about the PR queue. This was very subtle use case as I didn’t consider what would happen to the PR calls if the other end wasn't available.