Big Data and Ruby

Update: we've manually withdrawn this proposal, so there is no confusion about voting for it. If you're willing to give a presentation about 'big data', propose one now!

Anybody out there doing serious Big Data work with Ruby?


    Vestibule isn't really suited to discussion of this kind; instead I suggest that you use the mailing list to gather some support and find someone willing to propose a talk.

    I've had good results in the past using Wukong from Infochimps ( which uses Hadoop streaming.

    I tried.

    We generate about 1TB of JSON log data per day, and we're starting to use a Hadoop cluster to analyse it.

    I wrote some map/reduce jobs using the Mandy gem, but ended up re-implementing them in Java (which was exactly as much fun as you'd expect) because it was somewhere between 5 and 7 times faster, IIRC.

    To be fair, I think the difference is mainly because Mandy depends on Hadoop streaming, and the java code doesn't.

    JRuby would probably have been almost, if not just as good.