Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce

Cloud computing has gained significant traction in recent years. The Map-Reduce framework is currently the most dominant programming model in cloud computing settings. In this paper, we describe Granules, a lightweight, streaming-based runtime for cloud computing which incorporates support for the Map-Reduce framework. Granules provides rich lifecycle support for developing scientific applications with support for iterative, periodic and data driven semantics for individual computations and pipelines. We describe our support for variants of the Map-Reduce framework. The paper presents a survey of related work in this area. Finally, this paper describes our performance evaluation of various aspects of the system, including (where possible) comparisons with other comparable systems.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Geoffrey C. Fox,et al.  Fault-Tolerant Reliable Delivery of Messages in Distributed Publish/Subscribe Systems , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[3]  Frederica Darema,et al.  The SPMD Model : Past, Present and Future , 2001, PVM/MPI.

[4]  Geoffrey C. Fox,et al.  A Framework for Secure End-to-End Delivery of Messages in Publish/Subscribe Systems , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[5]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[6]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[7]  Geoffrey C. Fox,et al.  NaradaBrokering: A Distributed Middleware Framework and Architecture for Enabling Durable Peer-to-Peer Grids , 2003, Middleware.

[8]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9]  Simson L. Garfinkel,et al.  An Evaluation of Amazon's Grid Computing Services: EC2, S3, and SQS , 2007 .

[10]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Geoffrey C. Fox,et al.  An Overview of the Granules Runtime for Cloud Computing , 2008, 2008 IEEE Fourth International Conference on eScience.

[13]  Andrew Lumsdaine,et al.  A Component Architecture for LAM/MPI , 2003, PVM/MPI.