Dynamic I/O-Aware Scheduling for Batch-Mode Applications on Chip Multiprocessor Systems of Cluster Platforms

Efficiency of batch processing is becoming increasingly important for many modern commercial service centers, e.g., clusters and cloud computing datacenters. However, periodical resource contentions have become the major performance obstacles for concurrently running applications on mainstream CMP servers. I/O contention is such a kind of obstacle, which may impede both the co-running performance of batch jobs and the system throughput seriously. In this paper, a dynamic I/O-aware scheduling algorithm is proposed to lower the impacts of I/O contention and to enhance the co-running performance in batch processing. We set up our environment on an 8-socket, 64-core server in Dawning Linux Cluster. Fifteen workloads ranging from 8 jobs to 256 jobs are evaluated. Our experimental results show significant improvements on the throughputs of the workloads, which range from 7% to 431%. Meanwhile, noticeable improvements on the slowdown of workloads and the average runtime for each job can be achieved. These results show that a well-tuned dynamic I/O-aware scheduler is beneficial for batch-mode services. It can also enhance the resource utilization via throughput improvement on modern service platforms.

[1]  Qing Yi,et al.  Layout-oblivious compiler optimization for matrix computations , 2013, TACO.

[2]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[3]  Joel H. Saltz,et al.  Tuning the performance of I/O-intensive parallel applications , 1996, IOPADS '96.

[4]  Christoph Lameter,et al.  Local and Remote Memory: Memory in a Linux/NUMA System , 2006 .

[5]  Xiaobing Feng,et al.  Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.

[6]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[7]  Z. Lin,et al.  Parallelizing I/O intensive applications for a workstation cluster: a case study , 1993, CARN.

[8]  Ravi Jain,et al.  Scheduling Parallel I/O Operations in Multiple Bus Systems , 1992, J. Parallel Distributed Comput..

[9]  Thomas R. Gross,et al.  Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead , 2011, ISMM '11.

[10]  Lin Gao,et al.  Loop recreation for thread‐level speculation on multicore processors , 2010, Softw. Pract. Exp..

[11]  Alok Choudhary,et al.  Exploiting Shared Memory to Improve Parallel I/O Performance , 2006, PVM/MPI.

[12]  Dongrui Fan,et al.  Extendable pattern-oriented optimization directives , 2012, International Symposium on Code Generation and Optimization (CGO 2011).

[13]  Lingjia Tang,et al.  Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.

[14]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[15]  Pen-Chung Yew,et al.  On mitigating memory bandwidth contention through bandwidth-aware scheduling , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Kyung D. Ryu,et al.  Efficient Network and I/O Throttling for Fine-Grain Cycle Stealing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[17]  김성찬,et al.  고 변환이득 및 격리 특성의 V-band용 4체배 Sub-harmonic Mixer , 2003 .

[18]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[19]  Lin Gao,et al.  Exploiting Speculative TLP in Recursive Programs by Dynamic Thread Prediction , 2009, CC.

[20]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[21]  Alan L. Cox,et al.  Scheduling I/O in virtual machine monitors , 2008, VEE '08.

[22]  Li Chen,et al.  PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization , 2009, Journal of Computer Science and Technology.

[23]  Anand Sivasubramaniam,et al.  Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms , 2007, VEE '07.

[24]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[25]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[27]  Lin Gao,et al.  Thread-Sensitive Modulo Scheduling for Multicore Processors , 2008, 2008 37th International Conference on Parallel Processing.

[28]  Ioan Raicu,et al.  I/O Throttling and Coordination for MapReduce , 2012 .

[29]  Kai Shen,et al.  FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.

[30]  Xiaobing Feng,et al.  An empirical model for predicting cross-core performance interference on multicore processors , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[31]  Chen Ding,et al.  Defensive loop tiling for shared cache , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[32]  Yang Yang,et al.  Automatic Library Generation for BLAS3 on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[33]  Devarshi Ghoshal,et al.  I/O performance of virtualized cloud environments , 2011, DataCloud-SC '11.

[34]  Ravi Jain,et al.  Heuristics for Scheduling I/O Operations , 1997, IEEE Trans. Parallel Distributed Syst..

[35]  Lin Gao,et al.  Loop recreation for thread-level speculation , 2007, 2007 International Conference on Parallel and Distributed Systems.

[36]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[37]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.