riak_redis_backend : thoughts
riak_redis_backend has migrated from a gist to a full size github project.
Finally, my code passes all tests I throw at it. It’s been a fun ride up to now. I still have a few ideas to implement but the bulk of the work is done — and it’s not even a 100 LOCs …
And last week I decided that Riak would be my “week-end sized tech”. In the wee hours of Monday morning, I decided that writing a Redis backend for Riak should be fun and easy.
It was easy — in hindsight. But I hit a few roadblocks.
The riak_backend_tests passed quite quickly but it was not enough. My own riak test code tendend to be slow and mapreduce jobs would timeout.
At first I thought about improving the performance of erldis (the erlang Redis library). With the help of Jacob Perkins, the erldis maintainer, we identified where to improve performance. Basically it consisted in moving from strings/lists to binaries. Performance was improved and 50% better.
I also removed many useless checks on keys belonging to sets. Just write every time as data in that case is very short and redis has very good write performance.
But my code still had mapreduce failures.
It all came from a misconception about that Partition argument in the start function … I ignored it and I was wrong. All the riak_redis_backends would connect to the same key space and exchange uselessly information until timeout.
I tried to have partitions connect to different redis databases. Not good as a Redis server can only have 16 databases. [update : this is by default. Redis can be configured to handle way more in redis.conf]
So I prefixed the key name by
Also, using the synchronous version of erldis also certainly slowed a bit. A put/delete operation is four Redis operations :
– adding to the bucket/key to the
– setting the bucket/key to the
– adding the key to the specific
– adding the
Bucket to the buckets set.
An incoming improvement will be to rollback if one of these operation fail (that’s an important one).
I sped up things by starting a process for each operation and wait for the result. The four operations are done in parallel for better efficiency.
Code is still a bit slower on insert/delete than DETS reference code, but consistently faster on mapreduce operations. (see the riak-playground escript)
Will this code be useful ?
I hope it can help. Both Riak and Redis are great and are great complements to each other. Redis is very fast while Riak handles masterless replication redundancy and mapreduce. So I do find them being a great match together.
For the time being the problem is that Redis is limited to RAM sized data sets. But it won’t last. antirez is committed to releasing to a virtual memory version of redis this year.
So that should not be an issue soon.
And is it really a problem ? I see my code as mitigating this temporary issue !
I taught RDBMS for several years. I’m sorry Dr. Codd, but database systems never have been this fun.