论文信息 - Performance Analysis of MapReduce-Based Distributed Systems for Iterative Data Processing Applications

Performance Analysis of MapReduce-Based Distributed Systems for Iterative Data Processing Applications

Recently, research on big data has been actively made because big data are generated in various scientific applications, such as biology and astronomy. Therefore, distributed data processing techniques have been studied to manage the big data in large number servers. Meanwhile, some scientific applications like genome data analysis require loop control in analyzing big data using a MapReduce framework. In this paper, we first describe the existing MapReduce-based distributed systems which support iterative data processing. In addition, we do the performance analysis of the existing distributed systems in terms of execution time for various scientific applications which require iterative data processing. Finally, based on the performance analysis, we discuss some requirements for a new MapReduce-based distributed system which supports iterative data processing efficiently.

Heeseung Jo | Min Yoon | Jae-Woo Chang | Dong Hoon Choi | Hyeong-Il Kim

[1] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[2] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..