论文信息 - Dynamic Data Partitioning and Virtual Machine Mapping: Efficient Data Intensive Computation

Dynamic Data Partitioning and Virtual Machine Mapping: Efficient Data Intensive Computation

Big data refers to data that is so large that it exceeds the processing capabilities of traditional systems. Big data can be awkward to work and the storage, processing and analysis of big data can be problematic. MapReduce is a recent programming model that can handle big data. MapReduce achieves this by distributing the storage and processing of data amongst a large number of computers (nodes). However, this means the time required to process a MapReduce job is dependent on whichever node is last to complete a task. This problem is exacerbated by heterogeneous environments. In this paper we propose a method to improve MapReduce execution in heterogeneous environments. This is done by dynamically partitioning data during the Map phase and by using virtual machine mapping in the Reduce phase in order to maximize resource utilization.

Ching-Hsien Hsu | Yeh-Ching Chung | Kenn Slagter

[1] Yon Dohn Chung,et al. Parallel data processing with MapReduce: a survey , 2012, SGMD.

[2] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] GhemawatSanjay,et al. The Google file system , 2003 .

[5] Sameh A. Salem,et al. Mapreduce Performance in Heterogeneous Environments : A Review , 2013 .

[6] Jorge-Arnulfo Quiané-Ruiz,et al. Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[7] Yun Tian,et al. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).