Migration-Based Online CPSCN Big Data Analysis in Data Centers

It is critical to schedule online data-intensive jobs effectively for various applications, including cyber-physical-system and social network system. It is also useful to support timely decision making and better prediction. In this paper, we investigate the online job scheduling problem with data migration for global job execution time reduction. We first establish a time model based on the real experimental results, and propose an online job placement algorithm by taking into account the benefit of both instantaneity and locality for the jobs. We then introduce data migration to the job placement algorithm. The core idea is to make a tradeoff between the migration cost and remote access cost. The simulation results demonstrate that our algorithm has a significant improvement than FIFO, and data migration shows effectiveness on global job execution time reduction. Our algorithms also provide an acceptable fairness for jobs.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[3]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[4]  Rajkumar Buyya,et al.  An Algorithm for Network and Data-aware Placement of Multi-Tier Applications in Cloud Data Centers , 2017, J. Netw. Comput. Appl..

[5]  Feng Xia,et al.  Probabilistic Detection of Missing Tags for Anonymous Multicategory RFID Systems , 2017, IEEE Transactions on Vehicular Technology.

[6]  Deying Li,et al.  Minimizing makespan and total completion time in MapReduce-like systems , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[7]  Tajana Rosing,et al.  Utilizing green energy prediction to schedule mixed batch and service jobs in data centers , 2011, OPSR.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[10]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[11]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[12]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[13]  Joseph Y.-T. Leung,et al.  Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .

[14]  Heng Zhang,et al.  Analysis of event-driven warning message propagation in Vehicular Ad Hoc Networks , 2017, Ad Hoc Networks.

[15]  Victor C. M. Leung,et al.  Predicting Temporal Social Contact Patterns for Data Forwarding in Opportunistic Mobile Networks , 2017, IEEE Transactions on Vehicular Technology.

[16]  Jie Wu,et al.  Towards location-aware joint job and data assignment in cloud data centers with NVM , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).

[17]  Albert Y. Zomaya,et al.  Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions , 2011, IEEE Transactions on Parallel and Distributed Systems.

[18]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[19]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[20]  Liya Thomas,et al.  Survey on MapReduce Scheduling Algorithms , 2014 .

[21]  Anand Raghunathan,et al.  ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters , 2014, USENIX Annual Technical Conference.

[22]  Yuanyuan Tian,et al.  CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..

[23]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[24]  Lei Ying,et al.  Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality , 2013, INFOCOM.

[25]  Fang Dong,et al.  BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[26]  Bo Wang,et al.  ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[27]  Anton Spivak,et al.  Data Preloading and Data Placement for MapReduce Performance Improving , 2016 .

[28]  Jianping Pan,et al.  Location-aware associated data placement for geo-distributed data-intensive applications , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[29]  Ishai Menache,et al.  Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[30]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[31]  Victor C. M. Leung,et al.  Toward Big Data in Green City , 2017, IEEE Communications Magazine.