Iterative MapReduce for Azure Cloud

MapReduce distributed data processing architecture has become the de-facto data-intensive analysis mechanism in compute clouds and in commodity clusters, mainly due to its excellent fault tolerance features, scalability, ease of use and the simpler programming model. MapReduceRoles for Azure (MR4Azure) is a decentralized, dynamically scalable MapReduce runtime we developed for Windows Azure Cloud platform using Microsoft Azure cloud infrastructure services as the building blocks. This paper presents Twister4Azure, which adds support for optimized iterative MapReduce computations to MR4Azure, based on the concepts of Twister Iterative MapReduce framework. Twister4Azure enables a wide array of large scale iterative data analysis and scientific applications to utilize Azure platform easily and efficiently, while preserving the fault tolerance, decentralized and dynamic scheduling features of MR4Azure. Both MR4Azure and Twister4Azure take advantage of the scalability, high availability and the distributed nature of cloud infrastructure services to avoid single point of failures, bandwidth bottlenecks and management overheads.

[1]  Geoffrey C. Fox,et al.  MapReduce in the Clouds for Science , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[2]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.