While Hadoop 1.0 (the current distributions) is driving the world with increasing speed, Hadoop 2.0 has already made debut with a bigger promise of overcoming some of the limitations of Hadoop 1.0 like scalability, cluster utilisation, agility and data processing without Map Reduce.
Hadoop 1.0 does what it promises brilliantly. Map Reduce is like the backbone of Hadoop 1.0. It is very good for batch processing but not much of help for real time and near-real time processing. Again to make a job work, it has to be or converted to be a Map Reduce job. Map Reduce is great for certain types of works but does not fit for all. In terms resource management Map Reduce and Hadoop 1.0 does not guarantee 100% or effective utilisation.
YARN (Yet Another Resource Negotiator) comes as a rescue for all of the above in Hadoop 2.0 and indeed becomes the backbone. It lets any application (job) to run in Hadoop and ensure effective resource utilisation. Map Reduce now just runs as an application on top of YARN. As a developer I am more interested because YARN also provide a development framework to create applications potentially using any language, so not limited by Java.
Well, that’s Hadoop 2.0. Last Monday, Raghu Ramakrishnan, Technical Fellow and CTO Information Services of Microsoft announced at ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) held in Chicago that they are working on another framework, which will run on top of YARN. REEF (Retainable Evaluator Execution Framework) is a set of libraries that provides support for task monitoring and restart, data movement and communications, and distributed state management. It will be ideal to implement iterative algorithms for graph analytics and machine learning.
Ramakrishnan also told that Microsoft would open source REEF in a month time. I am quite excited and will keep following on it. As it is a MS offering I am also expecting that it will be possible to write programs on top of REEF using .Net languages.