Data-Centric Task Scheduling Algorithm for Hybrid Tasks in Cloud Data Centers

With the development of big data, a demand for data analysis keeps increasing. This requirement has prompted a need for data-aware task scheduling approach that can simultaneously schedule various tasks such as batched tasks and real-time tasks in a data center efficiently. To this end, we propose a hybrid task scheduling strategy coupled with data migration in data center. Firstly, we translate the task scheduling problem into task selection problem, and give methods of selecting batched tasks and real-time tasks respectively. Then the method for scheduling both batched tasks and real-time tasks is introduced in detail. Finally, we integrate data migration into the hybrid scheduling strategy. Experimental results show that, compared to the traditional FIFO algorithm, the proposed task scheduling strategy greatly improves the data locality and data migration performs very well on reducing the job execution time. Our algorithm also guarantees an acceptable fairness for tasks.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[3]  Jianping Pan,et al.  Location-aware associated data placement for geo-distributed data-intensive applications , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[4]  Lei Ying,et al.  MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality , 2013, IEEE/ACM Transactions on Networking.

[5]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[6]  Jie Wu,et al.  Towards location-aware joint job and data assignment in cloud data centers with NVM , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[7]  Liya Thomas,et al.  Survey on MapReduce Scheduling Algorithms , 2014 .

[8]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[9]  Albert Y. Zomaya,et al.  Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[11]  Jie Wu,et al.  Efficient Cloudlet Deployment: Local Cooperation and Regional Proxy , 2018, 2018 International Conference on Computing, Networking and Communications (ICNC).

[12]  Victor C. M. Leung,et al.  Toward Big Data in Green City , 2017, IEEE Communications Magazine.

[13]  Keqin Li,et al.  Minimizing SLA violation and power consumption in Cloud data centers using adaptive energy-aware algorithms , 2017, Future Gener. Comput. Syst..

[14]  Xin Li,et al.  Migration-Based Online CPSCN Big Data Analysis in Data Centers , 2018, IEEE Access.

[15]  Osamu Tatebe,et al.  Data-Aware Task Dispatching for Batch Queuing System , 2017, IEEE Systems Journal.