Cassandra operation success ratio survey results

It’s known that in Cassandra the compaction hurts the node performance so that the node might miss some requests. That’s why it’s important to handle these situations and the client needs to retry the operation into another working host. We have been storing performance data from each cassandra request which we do into our five node cassandra production cluster.

We log the retry count and request type into our data warehouse solution and I’ve now extracted the data from a 10 day period and calculated how many retry requests is needed so that the results can be obtained. The following chart tells how many time an operation had to be retried until it was successfully completed. The percents tells the probability like that “the request will be successful with the
first try in 99.933 % times.”

Total amount of operations: 94 682 251 within 10 days.

Retry times	operations	percentage from total operations
0	94618468	99.93263 %
1	56688	0.05987 %
2	5018	0.00529 %
3	1359	0.00144 %
4	111	0.00012 %
5	25	0.00003 %

There were also few operations which needed more than five retries, so preparing to try up to ten times is not a bad idea.

The cluster users 0.6.5 with RF=3. Dynamic Snitching was not enabled. Each operation is executed until it succeeds or until 10 retries using this php wrapper http://github.com/dynamoid/cassandra-utilities

Juho Garo Mäkinen's blog

Useful learnings from software engineering.

Cassandra operation success ratio survey results