论文信息 - A New Schedule Strategy for Heterogenous Workload-aware in Hadoop

A New Schedule Strategy for Heterogenous Workload-aware in Hadoop

Demand for large-scale data mining and data analysis has led both industry and academia to design highly scalable data-intensive computing platforms. MapReduce is a well-known programming model to process large amount of data. However, current implementations perform poorly and are inefficient, even to run a single MapReduce job. To manage and process enormous data, multi-jobs instead of single job, running in the platform. Different research and different Job processing, there are different characters in request and utilization of resources. Most schedule strategy applied in Hadoop ignores these differences, so resources utilization rate and job processing efficiencies may be impaired. As to this problem, we put forward a schedule strategy based on job type classification. In this paper, we put forward a schedule strategy based on job type classification. This schedule strategy includes two parts. 1) Divide the job dynamically into two types based on cluster historical operating data: CPU-intensive and I/O-intensive. 2) To remove the influence of noise data on the reliability of historical data, we put forward a schedule strategy-- CICS (CPU and I/O Characteristic Estimation Strategy. That is mainly based on classical FCFS and has been modified intensively on Fairness.

Qiang Liu | Xiaoshe Dong | Zhe Wang | Pengfei Zheng | Zhengdong Zhu

[1] Chao Tian,et al. A Dynamic MapReduce Scheduler for Heterogeneous Workloads , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[2] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[4] Victor Lee,et al. Implications of I/O for Gang Scheduled Workloads , 1997, JSSPP.

[5] Mark S. Squillante,et al. Models of Parallel Applications with Large Computation and I/O Requirements , 2002, IEEE Trans. Software Eng..

[6] Shrinivas B. Joshi,et al. Apache hadoop performance-tuning methodologies and best practices , 2012, ICPE '12.

[7] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.