A New Schedule Strategy for Heterogenous Workload-aware in Hadoop

Demand for large-scale data mining and data analysis has led both industry and academia to design highly scalable data-intensive computing platforms. MapReduce is a well-known programming model to process large amount of data. However, current implementations perform poorly and are inefficient, even to run a single MapReduce job. To manage and process enormous data, multi-jobs instead of single job, running in the platform. Different research and different Job processing, there are different characters in request and utilization of resources. Most schedule strategy applied in Hadoop ignores these differences, so resources utilization rate and job processing efficiencies may be impaired. As to this problem, we put forward a schedule strategy based on job type classification. In this paper, we put forward a schedule strategy based on job type classification. This schedule strategy includes two parts. 1) Divide the job dynamically into two types based on cluster historical operating data: CPU-intensive and I/O-intensive. 2) To remove the influence of noise data on the reliability of historical data, we put forward a schedule strategy-- CICS (CPU and I/O Characteristic Estimation Strategy. That is mainly based on classical FCFS and has been modified intensively on Fairness.