Big Data and Ruby
updated over 4 years ago; latest suggestion over 4 years ago
This proposal has been withdrawn...
Update: we've manually withdrawn this proposal, so there is no confusion about voting for it. If you're willing to give a presentation about 'big data', propose one now!
Anybody out there doing serious Big Data work with Ruby?
Vestibule isn't really suited to discussion of this kind; instead I suggest that you use the mailing list to gather some support and find someone willing to propose a talk.
I've had good results in the past using Wukong from Infochimps (https://github.com/infochimps-labs/wukong) which uses Hadoop streaming.
We generate about 1TB of JSON log data per day, and we're starting to use a Hadoop cluster to analyse it.
I wrote some map/reduce jobs using the Mandy gem, but ended up re-implementing them in Java (which was exactly as much fun as you'd expect) because it was somewhere between 5 and 7 times faster, IIRC.
To be fair, I think the difference is mainly because Mandy depends on Hadoop streaming, and the java code doesn't.
JRuby would probably have been almost, if not just as good.