Twitter has revealed plans to open source Storm, the Hadoop-like analytics platform it gained through its acquisition of BackType last month.
Hadoop is a type of distributed file system. Backtype has previously called Storm the “Hadoop of real-time processing”.
While Hadoop runs finite “MapReduce jobs” with queues and workers, Storm's “topology” processes messages forever or until it is actively switched off, BackType’s Nathan Marz wrote on Twitter’s Engineering blog.
The most notable users of Hadoop clusters were Yahoo and Facebook, however in recent months hardware vendors such as Dell and EMC have delivered pre-configured Hadoop stacks for enterprise customers.
Twitter also used a distribution of Hadoop built by Dell’s partner for its recently announced offering, Cloudera, according to Read Write Web.
Storm’s advantages over Hadoop, according to Marz, were that it avoided queues in the process to update a database and was fault tolerant; it supported continuous computation that was suitable for streaming Twitter’s trending topics in a browser; and that it could run “intense” queries in parallel.
"It abstracts the message passing away, automatically parallelizes the stream computation on a cluster of machines, and lets you focus on your realtime processing logic," Marz explained in an earlier blog post for BackType.
Another standout feature was “Storm's awesome automated deploy” that allowed a user create a Storm cluster on Amazon’s EC2 cloud “with just the click of a button”, he said.
Prior to Twitter's acquisition, BackType had released a successful product, BackTweets, which offered companies a sales lead-generation system to track who messages were reaching.