MR-LEGOS: a refined MapReduce model

MapReduce is a parallel programming model that is proven to scale. However, using the low-level MapReduce for general data processing tasks poses the problem of developing, maintaining and reusing custom low-level user code. Several frameworks have emerged to address this problem. We highlight several issues in these approaches and alternatively propose a novel refined MapReduce model (MR-LEGOS); an explicit model for composing MapReduce constructs from simpler components, namely, ‘Maplets’, ‘Reducelets’ and optionally ‘Combinelets’. This composition can be viewed as defining a micro-workflow inside the MapReduce job. Using MR-LEGOS, complex problem semantics can be defined in the encompassing micro-workflow while keeping the building blocks simple. The model is analogous to LEGO bricks. Having a collection of these standard and reusable predefined bricks, helps define complex processing tasks efficiently. We present the design details, usage scenarios, performance experiments and highlight the main features of MR-LEGOS.

[1]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[2]  Daniel J. Abadi,et al.  Data Management in the Cloud: Limitations and Opportunities , 2009, IEEE Data Eng. Bull..

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Cong Wang,et al.  Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing , 2009, ESORICS.

[5]  Gabriel Antoniu Autonomic Cloud Storage: Challenges at Stake , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[6]  James J. Kistler,et al.  Challenges, Techniques and Directions in Building XSeek: an XML Search Engine. , 2009 .

[7]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[8]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[9]  Liang Yan,et al.  Strengthen Cloud Computing Security with Federal Identity Management Using Hierarchical Identity-Based Cryptography , 2009, CloudCom.

[10]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[11]  Lucian Popa,et al.  BioFederator: A Data Federation System for Bioinformatics on the Web , 2007 .

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.