Simplifying MapReduce Data Processing

MapReduce is a programming model developed by Google for processing and generating large data sets in distributed environments. Many real-world tasks can be implemented by two functions, map and reduce. MapReduce plays a key role in Cloud Computing, since it decreases the complexity of the distributed programming and is easy to be developed on large clusters of common machines. Hadoop, an open-source project, is used to implement Google MapReduce architecture. It is wildly used by many applications such as Face Book, Yahoo, Twitter, and so on. However, it is difficult to decouple an application into functions of map and reduce for common users. In this paper, we develop a web-based graphic user interface for ordinary users to utilize MapReduce without the real programming. Users only have to know how to specify their tasks in target-value-action tuples. Real examples are provided for demonstration.

[1]  Jimmy J. Lin,et al.  Pairwise Document Similarity in Large Collections with MapReduce , 2008, ACL.

[2]  James L. Johnson SQL in the Clouds , 2009, Computing in Science & Engineering.

[3]  Chuck Lam,et al.  Hadoop in Action , 2010 .

[4]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[5]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .