Storing Data the Cassandra Way

Comments

Comments are closed.

The talk was informative, but I had some issues with your wording. Before you make any "you have to do ALLLL this work" claims, you should learn more about the various options RDBMS offer.

RDBMS have more strategies than simple master-slave replication. There is also a great certificate-based master-master replication strategy, which comes very close to what Cassandra offers in "ALL" or "QUORUM" consistency mode. Basically all principles you described for a RDBMS that you "had to do", also needs to happen in Cassandra land. Only Cassandra has this "built in". so that the user is not too much bothered with it.

Another claim you made was that: each Cassandra node is equal and that all nodes are always available. This sounds great, but naturally there are things that can go wrong outside the happy cloud that is called Cassandra. Your applications still needs a fallback strategy to connect to Cassandra, if the machine a Cassandra node is running on, is no longer reachable from your application. You can solve this with a local DNS I suppose or a proxy (HAProxy?) or something more resilient such as Keepalived, but I fail to see how this is any different then for example any responsibility in your architecture. Unless you solve it on all levels, you always have the "single point of failure" you mentioned. Maybe it was my interpretation of your talk, but I found the "it just works" claims too naive of a comparison to a RDBMS.

I'm a big fan of NoSQL solutions. But perhaps it's more interesting to make a comparison of Cassandra with Accumulo or perhaps ElasticSearch (it's not just an indexer).

Thanks for taking the time to read my two cents and thanks a lot for giving the talk!

Anonymous at 19:24 on 3 Apr 2015

A nice intro to Cassandra, with excellent little graphics. Expected to see something on shards, rather than simply nodes, but sticking to nodes helped people ask "what if node x dies" etc without getting overly complicated for a short talk.

The issue of a node overwriting earlier data because "timeStamps Yo! and Last Write Wins" is still always a problem, at least so long as timestamps instead of, say, vector clocks are used to determine who was really first. LLW has its own issues, so it would have been interesting to know if a Cassandra user can set up other rules, same as setting your own quorum-levels.

For PostGres users, 2ndQuadrant has made some Bi-Directional Replication core contributions to PostGres, which is pretty cool https://wiki.postgresql.org/wiki/BDR_Project
(I bring up PostGres because many using it take advantage of the built-in JSON type and hstore for when they need a mongo-y redissy k-v-pair something-something at the same time as an RDBMS.)

I found it interesting that the way to Cassandra was mySQL->Mongo->Cassandra. On the other hand, tons of sensors sending tons of data does indeed not sound like something that needs or wants 100% data-reliability (compared to, say, people at an auction where lots of data writes and needs fast, non-stale reads, but you can't screw it up).

I agree with Mark vd Velden in that the talk would have been cooler if Casssandra had been more compared with other NoSQLers (except Mongo, because, god... Mongo) in terms of failover, backups, stale data w/ high availability etc with NoSQL solutions.
Still, I loved the talk. DOMCode can totally have more DB-related talks :)