riak_redis_backend : thoughts

Post Mortem

riak_redis_backend has migrated from a gist to a full size github project.

Finally, my code passes all tests I throw at it. It’s been a fun ride up to now. I still have a few ideas to implement but the bulk of the work is done — and it’s not even a 100 LOCs …

I recently spent some quality time with redis and the erldis erlang library for a project.

And last week I decided that Riak would be my “week-end sized tech”. In the wee hours of Monday morning, I decided that writing a Redis backend for Riak should be fun and easy.

It was easy — in hindsight. But I hit a few roadblocks.

The riak_backend_tests passed quite quickly but it was not enough. My own riak test code tendend to be slow and mapreduce jobs would timeout.

At first I thought about improving the performance of erldis (the erlang Redis library). With the help of Jacob Perkins, the erldis maintainer, we identified where to improve performance. Basically it consisted in moving from strings/lists to binaries. Performance was improved and 50% better.

Code available here

I also removed many useless checks on keys belonging to sets. Just write every time as data in that case is very short and redis has very good write performance.

But my code still had mapreduce failures.

It all came from a misconception about that Partition argument in the start function … I ignored it and I was wrong. All the riak_redis_backends would connect to the same key space and exchange uselessly information until timeout.

I tried to have partitions connect to different redis databases. Not good as a Redis server can only have 16 databases. [update : this is by default. Redis can be configured to handle way more in redis.conf]

So I prefixed the key name by node()Partition

Also, using the synchronous version of erldis also certainly slowed a bit. A put/delete operation is four Redis operations :

– adding to the bucket/key to the world set.

– setting the bucket/key to the binary_to_term‘ed Value.

– adding the key to the specific Bucket set.

– adding the Bucket to the buckets set.

An incoming improvement will be to rollback if one of these operation fail (that’s an important one).

I sped up things by starting a process for each operation and wait for the result. The four operations are done in parallel for better efficiency.

Code is still a bit slower on insert/delete than DETS reference code, but consistently faster on mapreduce operations. (see the riak-playground escript)

The future

Will this code be useful ?

I hope it can help. Both Riak and Redis are great and are great complements to each other. Redis is very fast while Riak handles masterless replication redundancy and mapreduce. So I do find them being a great match together.

For the time being the problem is that Redis is limited to RAM sized data sets. But it won’t last. antirez is committed to releasing to a virtual memory version of redis this year.

So that should not be an issue soon.

And is it really a problem ? I see my code as mitigating this temporary issue !

I taught RDBMS for several years. I’m sorry Dr. Codd, but database systems never have been this fun.

About these ads

2 comments so far

  1. Jacob on

    Great to hear everything is working well and you were able to achieve higher mapreduce performance. Looking at riak_redis_backend.erl, I wonder if put & delete would work better with pipelining, as long as you don’t lose synchronous collection of replies. Some kind of batch sending on a per-From basis.

    Anyway, once I make sure the binary handling doesn’t compromise sharing data in redis with python apps, I’ll pull it in to my erldis repo. I’m looking forward to the improved performance :)

  2. cstar on

    Yes I am pretty sure pipelining would improve. I did a poor man’s pipeline with my spawns, but it is definitely on the radar.

    My initial code was inserting in 30 secs when DETS was inserting in 5 secs. Now my code is within 6-8 sec, and pipelining should definitely kick DETS’s ass ;)

    I don’t foresee any problems wrt data sharing but one should always be sure. Also we might write a compatility layer for the list API.

    Thanks Jacob for your help !


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: