Oracle Coherence vs Gigaspaces XAP

I’ve been fortunate enough to work (read: get to play) with two leading data/computing grid solutions in commercial projects over the last year — so here’s a short summary of differences between Oracle Coherence and GigaSpaces XAP.

If this is a topic that interests you, you might also be interested in attending a free one-day conference in London on cloud and grid technologies on 9th of July (see gamingscalability.org for more information). At the event I’ll present an experience report from one of the projects I’ve mentioned here and go into much more detail on what we got out of it.

Data Cache

Both systems support deploying a data grid on multiple machines and automatically manage routing, fault-tolerance and fail-over. Both grids support passive data caches and sending code to be executed on the node where a particular object resides rather than pulling the object from the cache and then running a command. Both grids support querying objects by their properties.

Both grids support event-driven notifications when objects are added or removed from the space. Coherence has notifications that go out to external clients (UI apps, for example) and supports continuous queries that will send updates without polling from the client. Gigaspaces only has notifications internally in the grid, meaning that you can set up a processor to receive events about new, updated and removed objects matched by a particular template.

Both systems have concepts of local caches for remote data partitions which automatically update when the remote data changes (Coherence calls this “near cache”). Coherence supports lots of caching topologies which can be flexibly configured, but Gigaspaces only supports a local partition, global space and local cache of the global space. Local caches in Gigaspaces are really for read-only access (reference data).

Both grids support .NET/Java interop to some level. Coherence does this by requiring you to specify a serializer implementation for your class on both ends, in which you basically just need to specify the order in which fields are serialized and deserialized. I haven’t tried out the Gigaspaces solution for interop. According to the documentation, if you follow a naming convention in both places (or override it with attributes and annotations), the grid will transform POJOs to POCOs and back fine. Again, without trying this myself I cannot actually tell you if it works or not.

Processing

Gigaspaces doesn’t just allow you to send code to the objects, it is actually designed around an event-driven processing model where objects are sent to processing code and the same processing code runs in each data partition. Events for processing are specified by example, matching templates on classes and non-null properties, and Gigaspaces manages thread pools and other execution aspects for you automatically. Events can be triggered by the state of entities in the grid, or by commands coming to be executed on the grid. It also has a fully transactional processing model, so if an exception gets thrown everything rolls back and another processor will pick up the command from the space again. It integrates with Spring transactions so transactional processing development is really easy.

Coherence has a reference command pattern implementation, not part of the basic deployment but as a library in the Coherence Incubator project. Its allows you to send commands as data grid objects to entities in the data grid but cannot directly invoke events based on entity properties in the grid. Coherence also has a very limited support for transactions – the JCA container gives you a last-logging-resource simulation but no real transactional guarantees.

Deployment

Gigaspaces is designed to replace application servers, so it has a nice deployment system that will automatically ship your application code across the network to relevant nodes, and cloud deployment scripts that will start up machines on EC2 as well. Until recently the scripts were a bit unreliable but version 6.6.4 fixed it. Coherence does clustering itself, but it was not intended to replace application server functionality. When it comes to deployment, you have to do it yourself. There is a JCA connector for application servers which I’ve tried with WebLogic (and version 3.3 of Coherence finally works out of the box with this), but there are lots of reasons why you do not want to run the whole grid inside WebLogic clusters or something similar, but have it as a separate cluster.

On the other hand, Coherence has a pluggable serialization mechanism (POF) which would theoretically allow us to run multiple versions of the same class in the grid and hotdeploy a new version of the application on nodes incrementally and without downtime (I haven’t tried this myself yet, though, so I don’t know whether it really works like that). Gigaspace applications (processing units) are split into two parts – a shared library distributed to all the applications and the processing unit specific code. Shared libraries cannot be redeployed after the grid starts, so a hot-deployment of processing unit specific code is fine, but not for any data that is actually stored in the grid. This is apparently going to be changed in version 7. Until then, the best bet for hot deployment on Gigaspaces is to split out the data format and business logic into separate classes and JARs. I’m not too happy about this, but once the class loading changes we might go back to nice object design.

Scaling

Both grids seem to scale enough to deal with problems which I am fighting with (order of magnitude 10 computers in a grid, haven’t tried them on deployments of hundreds). However, Coherence scales dynamically — you can add more nodes to the cluster on the fly, without stopping the application. This allows you to scale up and down on demand. Gigaspaces deploys data to a fixed number partitions and fixes it for the lifetime of the data space. If a machine goes down, a backup partition will take over and on clouds you can even have a new machine instance started up for you automatically, but you cannot increase or decrease the number of partitions after the grid has started.

Persistency

Both grids have read-through and write-through support. Gigaspaces comes with a Hibernate-based asynchronous persistency system with mirroring (allowing you to move database writes to a separate node) out of the box. Although the idea is nice, in the current incarnation it has quite a few rough edges so we ended up rolling out our own. For real read-through and write-through to work on Coherence you need to ensure that you configured and deployed persistency code to all the nodes, which might be a bit of a challenge if a part of the grid is running in an application server and a part of the grid is running outside of application servers (especially with non-coherence clients). Since Gigaspaces handles the deployment for you, it makes it a bit easier to run configurations such as these. Gigaspaces also has the concept of an initial load that will pre-populate the memory space with objects from the database and supports on-demand cleanup from the grid without deleting objects in the persistent store.

So when should you use what and why?

There is no clear winner in this comparison because these two products seem to be suited to different problems. Gigaspace is a heavyweight replacement for application servers and in my view best suitable for distributed transactional processing. Its processing model is much more flexible than the one in Coherence and has more features, not least proper transaction support. Coherence seems to be a much better solution for passive read-mostly data grids. It can grow and shrink dynamically. It supports much more flexible topologies and has more powerful libraries for client applications.

I'm Gojko Adzic, author of Impact Mapping and Specification by Example. To learn about discounts on my books, conferences and workshops, sign up for Impact or follow me on Twitter. Join me at these conferences and workshops:

Product Owner Survival Camp

Specification by Example Workshop

Conference talks and workshops

18 thoughts on “Oracle Coherence vs Gigaspaces XAP

  1. Nice article.

    Quick question though: aren’t there other vendors in this space like GemStone’s GemFire and Terracotta? Are you going to compare them to Coherence and XAP as well? And what about memcached/memcachedb?

  2. Thanks, very helpful review!

    I have to say that lack to full JTA support for Coherence is disappointing. There are several techtalks of Cameron Purdy (former CTO of Tangosol), where he says things that can be interpreted as if Coherence provides ACID guaranties.

  3. @Pierce

    I haven’t used teracota or gemfire commercially, so I don’t think that I can compare them to these two. That shouldn’t stop you from publishing a review though. if it is good, send me a ping-back and I’ll publish the link in this post.

  4. “Gigaspaces only has notifications internally in the grid, meaning that you can set up a processor to receive events about new, updated and removed objects matched by a particular template.”

    That is not correct, GigaSpace allows remote client to register for notification exactly the same as if it was an embedded service collocated with in the grid.
    It also supports continues queries with the View feature where the client doesn’t need to poll data from the space, it is being done for him behind the scenes.

    “Both grids support .NET/Java interop to some level”

    GigaSpaces support in memory two way interoperability between .NET and Java, this allows the ability to have .NET processing inside the grid as well, you can deploy a processing unit without needing to implement any java counter part which only contains .NET business logic and data model, and have it run and managed with in the service grid. You can also have an interoperable processing units that interacts with each other. Because interoperability is in process and two way , GigaSpaces plugable components, such as custom persistency implementations, are supported in .NET as well, in fact, it comes with built in NHibernate-based asynchronous persistency that can be used in the same fashion the hibernate implementation does. And any other custom peristency implementation can be implemented as well like it can be implemented in Java. To sum it up, you can have a pure .NET environment and make use of all GigaSpaces grid components without having to implement anything in Java or have a fully interoperable environments.
    GigaSpaces comes with a separate offering for .NET, named XAP.NET, which allows a pure .NET developer the ability not to deal with Java at all.

    Eitan

  5. yep, acid shouldn’t be a problem if coherence is the only resource in the transaction. LLR comes into play with multiple resources and coordination.

  6. Gojko

    Finally a well balanced report – well done.

    Further to Eitan comments i wanted to emphasis few corrections.

    Every component in GS can be accessed local or remote in the same way. This is done just by changing the space url on the processing unit – a remote url would point to a remote space and local-url would point to a local embedded space.

    This means that on local-view and local-cache you could still use complex queries and not just id based queries. The local cache is primarly optimized for read-mostly scenarios but is not limited in terms of API i.e. you can use the same API for read and for write and you can also use transaction.

    The same goes for the topologies – we support partition, partition with backups and replication. In each of those topologies you can define whether the replication would be synchronous or asynchronous. All the topologies are completely abstracted from the application API and are done mostly through configuration.

    Basic components

    Basic topologies

    “If a machine goes down, a backup partition will take over and on clouds you can even have a new machine instance started up for you automatically, but you cannot increase or decrease the number of partitions after the grid has started.”

    As far as i know this is the same with both products only that we explicit partition instances and Coherence uses implicit logical partitions. In both cases dynamic scaling would be changing the number of running partitions per JVM container (GSC in our terminology). You can start with 100 of partitions even if you have two machines and spread those partitions as soon there are more resource available. When a machien unit goes down the system will not wait till a new machine becomes available – it will scale down to the existing containers as long as it detects that there is enough memory and CPU capacity available. I’m not sure how scaling down works with coherence but one thing to check is whether there scaling down could lead to out of memory issues in case there is not enough capacity on the available machines.

    I would refer to the main difference between the two approaches as black-box(Coherence) vs White-box (GS). In our philosophy clustering behaviour should be managed in the same consistent way across the entire application stack which means that when we manage a partition or when we scale the application we scale not just the data but the business logic and messaging and any other component that needs to be associated with that. Our expreience showed the the black-box approach is simpler to get started with but can be fairly complex once you start to deal with scaling on other layers of the application such as the business logic, messaging layer or the web layer. In many cases this leads to different clustering models across the application tiers which leads to more moving parts and complexity etc. For example in our case if a data grid container or a web container crashes the process of maintaining thier high availability would be exactly the same.

    As of XAP 7.0 release we also added the ability for users to write their own custom SLA and scaling behavior programaticaly – See reference here
    This will enable you to monitor the entire application behavior and decide what threshold should trigger scaling out or down, automate the entire deployment and manage slef-healing when there is a failure.

    This wouldn’t be possible if we would have taken the black-box approach.

    HTH
    Nati S.
    CTO GigaSpaces

  7. Pingback: Space-Based Architecture vs Gigaspaces « Tales from a Trading Desk

  8. Hi there,
    Gojko, this is amazingly mediocre post from a person that gave a presentation at qCon.
    I think you are approaching Coherence with the wrong mindset and your summary statement about Coherence being suitable for read-only cache is hilarious.
    Coherence requires the correct mindset and shift in culture. I guess for a JEE developer it looks too bare-bone, insecure and low-profile. My impression is that it is targeted to hardcore developers. It definitely wouldn’t hold your hand and sometimes even bites badly! But is a reliable and solid problem-solver. Guess you don’t have the right problems at hand.

    I am not affiliated with Coherence by any means (unfortunately). I am just a developer that had to choose and is very happy with the choice he made.

    CU,
    Georgi

  9. Georgi,

    I’m sorry you feel that way, but I guess you can’t please everyone :) I did not write that Coherence is for read-only but for read-mostly, and there is a huge difference between the two concepts. It doesn’t look too insecure or bare-bones and I don’t know what in my post gave you that impression — I consider it a really good and relatively lightweight distributed caching system. And I do think that read-mostly caches is where it is really good at, taking the load from the database for publishing frequently read data.

  10. I carefully reread your post and I see we are looking at the problem from different angles. I am still seeing no serious arguments that clearly show GigaSpaces being the better transaction processing product. Maybe my impedance mismatch comes from the fact that you mean ‘distributed transaction’ and thats not the kind of transaction I would think of when discussing heavy-weight parallel processing solutions.

    Coming from the betting business you might know which product was chosen by one of the largest UK betting operators…AFAIK they use it for processing (transactional or not), not just for read-mostly caching.

    P.S. Coherence’s partition count is also fixed and cannot be changed while the cluster is running.
    P.P.S. this is your blog, so please don’t feel obliged to please me. We are all free to express our opinions, aren’t we?

  11. Georgi,

    Like any other tool, XA has its use and can help if used properly or hurt a lot if misused. Not every problem in a distributed system is the same, and some require different solutions than others. XA is a performance hog and should not be used for frequent operations, but at least on the systems that I work with there are loads of operations that run once or twice a second where you aren’t going to notice a big difference. Reference data is often read-mostly and not updated that often. Caching it makes a lot of sense and that is where data grids come in.

    If you are putting a cache on top of an existing system where the database is the most authoritative source of truth, changing all the apps is typically too costly. Database still has to remain the source of truth but updates to reference data should be coordinated with the cache – often by enqueuing a notification to change the cache or changing it directly. Unless you are using Oracle AQ or something similar which makes the queues use the same transaction manager as the database, there you have two resources in a transaction: the database and a queue or cache. And that is where full XA support really comes in. It guarantees that the database update and the queue/cache update will both succeed or fail so that there is no data corruption or inconsistency in the cache. Coherence has LLR XA support so there is a chance that the database update can go through and the grid commit fails because of temporary network issues. Without proper XA, you need to code around this, make simpler atomic operations that wrap one resource in other’s transaction and recover from errors manually etc. So a few cleverly designed XA operations can make the bulk of code where performance isn’t such an issue a lot simpler, easier to maintain, less error prone and easier to troubleshoot, leaving you with more time to deal with parts of the system where you really need to focus and where XA should not apply.

    I just like the option of having it there if I need it and when I need it.

    The reason why I suggested Gigaspaces over Coherence for processing is that it is built around a processing model and has a lot of flexibility in terms of how you dispatch tasks, how they are picked up and processed, automatically hides the target objects frm other processors in parallel, has spring declarative transaction support and data mirroring etc, so it allows you to write parallel processing apps efficiently. Command pattern in Coherence is a good start, but not as fully-featured as Gigaspaces. That doesn’t mean that you cannot do parallel processing with Coherence – of course you can – but in my opinion it is easier to write it with Gigaspaces because the framework gives you more.

  12. Hi -

    > The reason why I suggested Gigaspaces over Coherence for
    > processing is that it is built around a processing model and
    > has a lot of flexibility in terms of how you dispatch tasks

    Coherence is a fully clustered data grid, and as such it is ideal for building large-scale state machines.

    Gigaspaces is a JavaSpaces implementation, and as such it is ideal for building master/worker (task dispatching) solutions.

    I wish you luck with your selection, and suggest you keep an eye on the Coherence Incubator (http://coherence.oracle.com/display/INCUBATOR/) for more than just the command pattern ;-)

    Peace,

    Cameron Purdy | Oracle Coherence
    http://coherence.oracle.com/

  13. Hi Gojko,

    I think your explanation of LLR XA resources is incorrect. From what I know (which could be wrong ;) LLR resources are prepared usually prepared last, but committed first and hence the problem you attribute to inconsistent state is incorrect. It has nothing to do with Coherence (per say) but is often a problem with underlying XA TMs. Having build a non-logging XA resource for Coherence that is part of a multi-billion dollar trading platform, I’m pretty confident that it’s ok.

    – Brian

    BTW: If the last resource in any XA transaction commit fails, you’re pretty much in an “unstable state”, especially if the resource can’t be recovered cleanly. LLR basically makes it explicit and lowers the XA implementation SLAs ;). Failure during perpare is always. Failure during commit typically means someone has to talk to a DBA.

  14. Hi,

    > Gigaspaces is a JavaSpaces implementation, and as such it is ideal for building
    > master/worker (task dispatching) solutions.

    Ideal but not limited to. Gigaspaces gives you JINI + Project RIO + JAVASPACE.

    I never considered the benefits of a Javaspaces without considering those of JINI, maybe that is what is missing in these discussions?

    Ale

  15. Since SEs and PMs from the other companies have chimed in, I’ll add my opinion. Having worked for Oracle and now working for GemStone on GemFire, I have done a lot of comparison of the products.

    Gojko, I applaud your efforts to help people compare the products. In my experience, doing my own comparisons and helping customers do theirs, the differences between the products tend not to show up until you go much deeper. While the features you describe are common to all of the top products, each of the vendors implement their solutions using some significant differences in their underlying technology, and these differences tend to show up only in full-scale performance tests. These tests, unfortunately, are tricky to build and execute, and if you decide to undertake them, I highly recommend you engage each vendor to help you configure and tune each test.

    All cluster products will tend to offer some level of dynamic capacity capability. However, without the right options and tuning parameters, small performance problems can cascade into total cluster collapse. This is one area that we have dedicated significant effort to, making sure the cluster behaves well in cases of adversity (node failure, congested networks, slow message consumers, etc). This is an extremely difficult problem to solve, and is one of the biggest reasons for choosing a commercial product over an open-source solution. It is also a difference that doesn’t show up with a paper comparison of the products.

    I would echo the comments from some of the other folks that warn of the problems of transactions, distributed or otherwise. The products tend to support them, but that doesn’t make their use a good idea. Yes, it takes some effort to code around the edge conditions properly, but this effort is more than rewarded when it comes to the overall scalability of the system. I would direct you to an independent (well, now he works for Microsoft) paper on the topic by Pat Helland (http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf).

    Good luck with your project.

    David Brown – GemStone GemFire

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>