Didn't understand much from the linked page, but I found this website (from Pivotal, the commercial entity behind Geode) quite informative. Perhaps it's useful to others.
China National Railways use Geode to run railway ticketing for the entire
country with a 10 node cluster, managing 2 TB of "hot data" in memory,
and 10 backup nodes for high availability and elastic scale.
Holiday travel periods [Chinese New Year's] create peaks of 15,000 tickets
sold per minute, 1.4 billion page views per day and 40,000 visits per second.
The guys have been for 3 decades sitting on the forested Oregon river bank (the place is called Beavertown for reason :) thinking straight and clear in Smalltalk... :)
i know (was talking at some point about joining, already here in SV), and this is why i wrote only "3 decades" as i don't know precise dates of GemFire only suppose it to be beginning of 200x. My simple arithmetics mistake though - 2002 - 1982 is 2 decades, sorry, i see where confusion comes from :)
FWIW, GemFire is all written in Java and has native clients for c++ and .net ( as well as REST clients). The product was built from the ground up in Java and runs highly scaled up , low latency systems across the globe. The place is called Beaverton and while there are many beautiful forested riverbanks, the Pivotal team does not sit next to one.
Landing page and documentation should be reconsidered.
lots of unanswered questions,
- distributed, in-memory database - how does it compare to SAP HANA? or Redis? if "... database" where is the querying language? if you have own querying language other than SQL please show us.
- Performance is key - tell us about some benchmarks
- Consistency is a must - CA or CP in CAP theorem?
- Can I fully drop RDBMS in favor of Geode?
- web site built as Landing page/PR for product, then all of a sudden starts Community/Contributors/Getting started columns, with loooooong list of mailing lists, smaller content for contributors and very important part(getting started) presented as tiny small link :(, please put there some useful info about product, you already have Community/Contribute menus.
CA is possible if its read only, essentially, with all updates synchronized when a partition is not present [which is the majority of the time in the real world].
Geode is write-intensive, and generally optimizes for consistency over availability. That being said - there's a lot built into Geode to ensure availability as well.
Redis and in-memory data grids are pretty different animals. I would characterize IMDG's like Geode to be concurrent write intensive, and have flexible data models. It also scales out better than Redis in a more automated fashion.
Redis is a great read-intensive cache. It also has a powerful data model, but you have to use their data models. Example: If you want to run calculations on lists or sets, they have powerful operations you can call.
IMDG's such as Geode were built with the rise of automated trading in the finance industry.
Geode also understands Redis protocol, so you can point your Redis clients to Geode.
This is interesting because the problem with Redis data structures is that they cannot scale beyond the memory available to the Redis server. The further limitation that Redis server is single threaded and cannot really be scaled up, only makes the problem worse.
With Geode your Redis data structures can scale horizontally.
Redis is single-threaded, so everything it does on a per-node basis is implicitly atomic. You can also force the single thread to handle a block of commands from a single socket at once by using MULTI (otherwise commands can be interleaved with other commands from other sockets).
It's not guaranteed to be atomic in failure conditions. Also, the biggest difference between Redis and Geode would be the "distributed" part, which involves maintaining these guarantees across a cluster of machines (which Redis demonstratingly doesn't do).
Apache is happy to provide a home for any community that is willing to adhere to our governance rules and traditions. Competing projects are OK.
Projects are almost never rejected because preparing a proposal for incubation is rigorous and many projects who would be a poor fit self-select out.
Source: former VP Apache Incubator, who has both helped prepare successful proposals and privately counseled projects who decided not to come to Apache.
I'm asking because a lot of Java "big data" stuff tend to prioritize Java clients (ZooKeeper, Kafka, Hadoop HDFS, Storm, VoltDB and HBase come to mind), and while there are sometimes clients in other languages, they tend to be second-class citizens that take years to reach feature/performance parity with the Java stuff.
For example, last I checked there still wasn't a mature, feature-complete Kafka client (consumer and producer with built in offset management) for Go.
Gemfire (on which geode is based) has a fully featured c, c++, c# client which has feature parity with the Java client. I don't know if pivotal is going to open source these clients too.
In the case of Geode, you can make a "Region" (think table) persistent on disk and using the concept of shared nothing architecture [1] to avoid SPOFs.
What's also interesting is that we offer a very efficient way to recover data from disk as well[2] in the case of a crash of a single node or the entire cluster.
FWIW, Pivotal is hiring in our Big Data team, largely based in Palo Alto. Geode (incubating), HAWQ (incubating), Greenplum, Pivotal HD, MADlib etc are all mostly developed with engineering effort that we donate.
Hit me up with an email (jchester@pivotal.io) or visit pivotal.io/careers if you're interested.
Datomic's immutable storage and time-travel query capabilities are awesome, and I often miss them in other DBs. But Datomic currently isn't designed for write-intensive workloads. And while you can shard Datomic's transactor and then combine multiple DBs in a query (http://nosql.mypopescu.com/post/19310504456/thoughts-about-d...), that's only going to get you so far.
However, Apache Geode lets you add custom indexes so it might not be too hard to add Clojure's persistent data structures as a custom index scheme and hook in Apache Geode as a backend to Clojure Datalog:
All: this is a major project (with a long history: http://geode.incubator.apache.org/about) and a major release. It deserves a substantive thread, so please let's not get sidetracked by a language troll.
https://pivotal.io/big-data/pivotal-gemfire
===
I found this interesting deployment:
http://pivotal.io/big-data/case-study/scaling-online-sales-f...