Performance Research on MapReduce Programming Model

Map Reduce programming model is designed to process large data sets in-parallel on large clusers. But most organizations can't afford to built a large cluster, so building a small cluster to improve the efficience of time-consuming applications is a perfect solution. Besides, non-data-intensive programs are common. Is Map Reduce suitable for this kind of programs? In this paper, a small cluster consisting of 5 PCs is built, with its configuration adjusted. Then a distributed FTP scan program is written to test whether Map Reduce is suitable for small data sets, network-intensive program. Finally a distributed string search program is written to test the performance of Map Reduce on large data sets. The results show that Map Reduce can run efficiently on small cluster, and it's also suitable for small data sets, network-I/O-intensive programs.

[1]  Lizhe Wang,et al.  Scientific Cloud Computing: Early Definition and Experience , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[2]  Jeffrey Dean,et al.  Keynote talk: Experiences with MapReduce, an abstraction for large-scale computation , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Zheng Chao Survey of research progress on cloud computing , 2010 .

[4]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[5]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[6]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[7]  Kang Chen,et al.  Cloud Computing: System Instances and Current Research: Cloud Computing: System Instances and Current Research , 2010 .

[8]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Sanjay Ghemawat,et al.  Distributed Programming with MapReduce , 2007 .

[11]  Zheng Wei,et al.  Cloud Computing:System Instances and Current Research , 2009 .