Towards Conceptual MapReduce Algorithm for Big Data Platform

MapReduce Comes from its simplicity to preparing the input data, the programmer needs only to implement the mapper, the reducer, and optionally, the combiner and the partitioner. All other aspects of execution are handled transparently by the execution framework on clusters ranging from a single node to a few thousand nodes, over datasets ranging from gigabytes to petabytes. However, this also means that any conceivable algorithm that a programmer wishes to develop must be expressed in terms of a small number of rigidly defined components that must fit together in very specific ways. It may not appear obvious how a multitude of algorithms can be recast into this programming model. The purpose of this paper is to provide, a guide to MapReduce algorithm design. This paper presents the notion of design pattern of MapReduce, which instantiate arrangements of components and specific techniques designed to handle frequently encountered situations across a variety of domains.

[1]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[2]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jeffrey D. Ullman,et al.  Optimizing Multiway Joins in a Map-Reduce Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Michael J. Davern,et al.  Measuring the effects of business intelligence systems: The relationship between business process and organizational performance , 2008, Int. J. Account. Inf. Syst..

[5]  Y. Pigneur,et al.  E‐business model design, classification, and measurements , 2002 .

[6]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[7]  Marti A. Hearst Chapter 2 of the second edition of Modern Information Retrieval Renamed Modern Information Retrieval : The Concepts and Technology behind Search , 2011 .

[8]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.

[9]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[10]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[11]  Liang Lin,et al.  Tenzing a SQL implementation on the MapReduce framework , 2011, Proc. VLDB Endow..

[12]  Xi He,et al.  Cloud Computing: a Perspective Study , 2010, New Generation Computing.

[13]  Herodotos Herodotou,et al.  Stubby: A Transformation-based Optimizer for MapReduce Workflows , 2012, Proc. VLDB Endow..

[14]  Martin Bichler,et al.  Design science in information systems research , 2006, Wirtschaftsinf..

[15]  Zvi M. Kedem,et al.  Charlotte: Metacomputing on the Web , 1999, Future Gener. Comput. Syst..

[16]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[17]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[18]  S. B. Dihal,et al.  Mobile Cloud Ecosystems: Evaluating the feasibility and viability of smartphones as a shared resource pool , 2011 .

[19]  José A. B. Fortes,et al.  Sky Computing , 2009, IEEE Internet Computing.

[20]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[21]  Noah Treuhaft,et al.  Cluster I/O with River: making the fast case common , 1999, IOPADS '99.

[22]  Kati Järvi,et al.  Using Value Co-Creation to Redefine Business Models , 2011 .

[23]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[24]  Harry Bouwman,et al.  Creating successful ICT services. Practical guidelines based on the STOF method. , 2008 .

[25]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[26]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[27]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[28]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[29]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[30]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[31]  Subhajyoti Bandyopadhyay,et al.  Cloud Computing - The Business Perspective , 2011, 2011 44th Hawaii International Conference on System Sciences.

[32]  Thomas R. Eisenmann,et al.  Opening Platforms: How, When and Why? , 2008 .

[33]  Andrea C. Arpaci-Dusseau,et al.  Explicit Control in the Batch-Aware Distributed File System , 2004, NSDI.

[34]  V. S. Subrahmanian,et al.  COSI: Cloud Oriented Subgraph Identification in Massive Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[35]  Christos Faloutsos,et al.  Clustering very large multi-dimensional datasets with MapReduce , 2011, KDD.