Seabeyond, XMPP Process One event

Yesterday was the day it snowed in Paris for the first time this season.

But it was not the only event. People gathered from ten countries to Paris to attend the SeaBeyond meetup.

And it was a good one.

Capture d’écran 2009-12-18 à 18.31.40.png

There’s the official point of view just available.

But wait, here’s mine !

mod_pubsub

Hacking occasionally on ejabberd, meeting the devs is always good. Christophe, mod_pubsub’s maintainer hosted a great discussion on the subject.

Among the subjects :
Pubsub performance, the usual questions … How fast is pubsub ? Should I use ODBC or mnesia ? Why two modules ?

How fast is Pubsub ?

As of ejabberd 2.1, many improvements are now implemented. But how fast depends on how you use pubsub. Many nodes, few subscribers ? Many subscribers ? What is the subscription rate ? How many items per node ?

the last_item_cache did a lot of good for performance especially if you have a high user churn.

ODBC or Mnesia ?

Vast question, but you’ve got many nodes and many many items, you’re better off with ODBC.

Why two modules ?

There will never be a merge between ODBC and mnesia. ODBC has gone under many optimisations, limiting the number of queries (6 times less since 2.0). It’s too bad we won’t get storage backend abstraction … maintaining my S3/SimpleDB version is still a bit of work, and pushing fancy nosql versions (riak ? redis backends ?), but it’s for a better performance in each case.

There was more !

But I got sidetracked by an interesting discussion with Erlang Solutions‘s Mietek Bąk on Haskell — apologies to the rest of the guys on the pubsub tables, as we got quite enthusiastic and noisy … and put off our discussion until later.

Christophe told that as version 3 of ejabberd would implement exmpp, one should get ready to rewrite one’s nodes and node_trees, but performance would get way better with exmpp.

Many people, many discussions

Discussed with one of the Nokia guys, told me about the difficulties of being Nokia when you try to innovate. You have to please 250 mobile operators all with different opinions. Especially when you try to get around their old abusively expensive business as Nokia is trying with Ovi.
Also toyed a bit with the N900. Nice phone.

Talked with Sebastian Geib, freelance sysadmin from Berlin, about working in Berlin/Germany, compared to Paris/France.

Also learned about Meetic’s chat architecture (overpowered) and how erlang is viewed by sysadmin (not favorably by default :).

And presentations

About the admin panel for ejabberd, Jingle, BBC use of PEPanon pubsub on ejabberd, Yoono and Process One’s Wave Server.

BBC’s use of PEPanon pubsub can be seen here., in the topmost flash.

Had to leave early and missed the Champagne and the Wave Server demo. But this talk by Mickaël Rémond was quite interesting. Quote of day : “Google wants third party wave servers to be good but not too good.”

Next year

I’ll be back.

Babies > unicorns

IMG_0315.jpg

A strategy for testing for ejabberd modules

I’ve always been looking for an elegant way of testing custom ejabberd modules.
Tried a couple of ways before but was never convinced. Running tests against a running ejabberd node for example. But it’s not easy, many dependancies, and hard to set up. Mocking modules such as ejabberd_router. But either I hit weird issues, either it’s so cumbersome, I knew I’d never use it again.

But this time, I think I’ve got it.

Check out the cool combination of etap and erl_mock !

It’s on github with more blathering from yours truly.

I am so testing the post by email feature

I’ve attached a picture and underlined things. (ed: the latter seems utterly failed)

martine utf8

Now I am trying posting from textmate.

Small update

Since the migration to wordpress, nothing new published. Maybe the biggest hiatus since I started blogging.

In a nutshell :

Stopped smoking

114 days since last smoke. Feeling good about it. Saved around 500€.

Got the new Toyota Prius

If you’re looking for a comfortable, silent, geeky (but with a family) car, that’s the one. Fuel economy is the same as ye ole’ Twingo but with twice the horsepower and twice the number of doors. And no manual shifting.

Getting prepped for the new baby

The boy (yes, another boy) is expected within a month.

Martin

Cute and funny: check
Speech: check
iPhone skills: check
Scribblenauts skills: check

Work related stuff

Now playing with Drupal. Very interesting, and making good use of PHP.
I do quite lot a Javascript.
It’s the most important language nowadays. And it should be learnt as soon as possible.

Protip : use and peruse Douglas Crockford’s Javascript: the Good Parts

The French version is also available.

Still have quite a lot of erlang to write, and it’s still very enjoyable.

Upcoming

I will “review” the Prius (in French) someday.

Continuation

Migrated the blog to this location.

Stray links might happen, and some assets are not reuploaded here yet.

Why fork the whole ejabberd tree ?

I had the question on PlanetErlang.

Why have you put whole ejabberd source to the repository? You could just put your modules to avoid constant merging from upstream.

Thank you, Anton, for enabling me to express some love to git and github.

The short answer

It’s easy and fun.

The longer answer

The early version of the code was actually in a separate private SVN repository. Part of my install procedure was copying the beams into the ejabberd ebin folder. But each time mod_muc or mod_pubsub modules were updated I had to launch FileMerge and merge things. And those modules are not slim.

Enters git and github. Brian J. Cully has a script updating every hour his ejabberd repository on github from the Process One svn repository.

My own ejabberd repository is fork from his.

And having my own tree up-to-date is only a matter of one (1) command :

“github pull bjc master“

Run sudo gem install github for installing the github gem.

Merges are done automatically. Of course the occasional conflict may arise, but whatever the process, I cannot avoid it.

Pushing to my github repository is also one command :

“git push origin master“

And if I want to send a patch right up to Process One ?

Say for pubsub …

“git diff bjc/master – src/mod_pubsub > pubsub.patch“

Contributing is easy

Fork my project, hack, push, pull request.

Can it be any simpler ? (This question is not rethorical)

ejabberd “cloud edition alpha”

Objectives

It’s an ejabberd-based proof-of-concept, with a set of custom modules aiming for making it stateless and very scalable on the storage backend.

All state data (including user accounts, roster information, persistent conference room, pubsub nodes and subscriptions) are stored in AWS webservices, S3 or SimpleDB.

It helps scaling up and down, and keeps managing costs at a proportianal cost. AWS services are very wide, and massively parallel access is what it’s all about.

Default ejabberd configuration uses mnesia, but Process One recommends switching some services like roster or auth to ODBC when load increases.

But DBMS have their own scaling problems, and that’s yet another piece of software to administrate.

CouchDB seems loads of fun, and I’d like to put some effort running ejabberd over it later on. Some work has started, but not much progress yet. (and CouchDB is still software to one needs to manage).

Current state

  • ejabberd_auth_sdb : store users in SimpleDB. The version in github stores password encrypted, but forces password in PLAIN over XMPP, that means that TLS is required (really !). I have a version somewhere which exchanges hashes on the wire but stores password in clear in SimpleDB. Your call.

  • mod_roster_sdb : roster information is stored in SimpleDB

  • mod_pubsub : nodetree data is stored in S3 along with items. Subscriptions are stored in SimpleDB. I reimplemented nodetreedefault and nodedefault, with means that PEP works fine too.

  • mod_muc : Uses modular_muc with the S3 storage for persisting rooms.

  • mod_offline : S3 for storing offline messages

  • mod_last_sdb : Stores last activity in SimpleDB

Still lacking :

Following the names of the modules, where to store data, in my opinion.

  • mod_shared_roster : in SimpleDB

  • mod_vcard : VCards in S3, index in SimpleDB

  • mod_private : S3

  • mod_privacy : S3

  • mod_muc_log : S3 (with a specific setting for direct serving, maybe)

These modules are the only one which have state that should be persisted on disk. Mnesia is of course still be used for routing, configuration – but that’s transient data.

Transactions and latency

We loose transactions by switching away from mnesia or ODBC. That may or may not be a problem. I think it won’t be, but I don’t have data to prove one way or the other.

Latency also grows, but erlsdb and erls3, the libraries on which the modules are built, can interface with memcached (and are ketama enabled) if you use merle. Additionally using merle will keep usage costs down.

ejabberd mod_pubsub underwent several optimizations recently, and that improved performance of non-memcached AWS mod_pubsub. Initial code had latency around 10 seconds between publishing and receiving the event. Since last week’s improvement, performance is much better.

Down the road

I’d wish to see an EC2 AMI based on this code, just pass the domain name or the ejabberd.cfg file to ec2-start-instance and boom ! you have an ejabberd server up and running.

Want more horse power ? Start another one on the same domain in the same EC2 security group, the ejabberd nodes autodiscover each other and you’ve got a cluster. ec2nodefinder is designed for this use.

Combined with the very neat upcoming load-balancing and autoscaling services Amazon Web Services, there’s a great opportunity for deploying big and cheap!

Alternatives to the AWS loadbalancing would be pen, or a “native” XMPP solution.

A few things would need to be implemented for this to work well, like XMPP fast reconnect via resumption and/or C2S/S2S process migration between servers, because scaling down is as important as scaling up in the cloud.

If you want to participate, you’d be very welcome. Porting the modules I did not write, or testing and sending feedback would be … lovely.

And of course if Process One wants to integrate this code in a way or another, that would also be lovely !

Get it

Get it, clone it, fork it ! There’s bit of documentation on the README page.

[edited : added links to XEP-0198 and rfc3920bis-08, thanks to Zsombor Szabó for pointing me to them]

Avishai Cohen au Bataclan

Thanks/Credits

Je dois beaucoup à Caféine.

Il m’a fait découvrir Avishai Cohen l’année dernière avec ce tweet.

Et la semaine dernière, il avait une place en rab, pour un concert complet dans un Bataclan blindé, et ce, malgré les deux dates. Et le concert du 24 novembre 2009 à l’Alhambra est déjà, lui aussi, complet.

Et il a pensé à moi pour une super place au 4ième rang, juste en face du percussionniste (toujours le membre le plus spectaculaire d’une formation.)

Merci Arnaud !

Laissez-moi écouter Aurora, le dernier album de Cohen, et me remettre dans le bain. C’est parti.

Le concert

Un quatuor contrebasse/piano/guitare/percussion + une chanteuse.

Tous complètement virtuoses et les cheveux rasés (non, pas la chanteuse).

Des inspirations jazz “classique” matinés de flamenco et de musique yiddish.

Une mention spéciale pour le percussionniste, 25 ans max, avec une pêche et une précision terrible.

On voyait qu’ils prenaient un plaisir à jouer. C’était joyeux et magnifique.

Au quatrième rappel, Cohen est venu s’excuser de ne pas pouvoir continuer à jouer, mais il était 22h passé, et il y avait le couvrefeu, sinon, on était bon pour tout la nuit … on serait tous restés !

Quelques notes en vrac :

  • Avishai Cohen laissait beaucoup les autres s’exprimer. Il a fallu attendre la deuxième moitié du concert pour avoir un solo de contrebasse.

  • Un duo magnifique guitare-piano (il me semble que c’était dans le morceau Leolam) qui donnait des frissons partout.

  • Cohen a joué deux morceaux avec 2 cordes en moins sur sa contrebasse. Le temps de faire remplacer les cordes, il a sorti la basse électrique, et nous a régalé avec un solo de basse terrible.

  • Toutes la parties d’une contrebasse peuvent servir à faire de la musique, avec la caisse pour le rythme, les cordes sous le chevalet pour les harmoniques

  • Entre les morceaux, Cohen savait nous faire rire. “Mes parents sont dans la salle ce soir, va falloir que je joue bien”

  • Cohen chante, et il chante bien !

Je ne connaissais aucun des morceaux, tous semblaient tirés de son nouvel album, à part un rappel sur Remembering – en duo basse-piano.

Une remarque pour le public : applaudir les solos, bien sûr ; taper dans les mains en rythme, c’est interdit parce que c’est pénible pour ceux qui veulent écouter la musique. Exception faite si les musiciens le demandent.

Les membres du quatuor

“L’oud d’Amos Hoffman, les percussions créatives d’Itamar Doari, le toucher délicat de Shai Maestro, la voix unique de Karen Malka.”
J’avais pas saisi leur nom pendant le concert, mais je me devais de les citer aussi.

5D4BBFE9-14CC-41E9-9C6D-919348C57821.jpg

erlsdb and erls3 use ibrowse

I had some issues with inets under heavy load with erlsdb and erls3.

And when you are talking to Amazon Web Services, you’d want to write in parallel as much as possible. You also want to pipeline requests in one single socket, especially while using SSL encryption (even more costly to establish).

ibrowse seemed very interesting, especially since the CouchDB project started using it !

Got it out of jungerl, which is always a bit of a pain. You can also find it on github, I figured later.

Porting my code to ibrowse was quite easy. Though I had to change a bit of the async code. Instead of sending one message once the inets process received the HTTP response, it sends a message upon receiving headers then a slew of messages for each chunk it receives.

Had a few Too Many Open Files errors while loadtesting. As it appears, I had over 500 connections opened to Amazon AWS. Got more sensible defaults and the problem went away.

Configuration is by host, that forced me to change the naming of the S3 buckets from http://bucket.s3.amazonaws.com/ to http://s3.amazonaws.com/bucket/

One caveat : accessing SimpleDB using SSL gives InvalidSignature errors for the time being. Will squash that soon.

Using ibrowse will also unable me to write a client to S3 that will stream files to and from disk.

The ibrowse version are in the ibrowse branch for both projects.