Towards the contention aware scheduling in HPC cluster environment

Contention for shared resources in High-Performance Computing (HPC) clusters occurs when jobs are concurrently executing on the same multicore node (there is a contention for shared caches, memory buses, memory controllers and memory domains). The shared resource contention incurs severe degradation to workload performance and stability and hence must be addressed. The state-of-the-art HPC clusters, however, are not contention-aware. The goal of this work is the design, implementation and evaluation of a virtualized HPC cluster framework that is contention aware.

[1]  Frank Bellosa,et al.  Virtual InfiniBand clusters for HPC clouds , 2012, CloudCP '12.

[2]  Reza Rooholamini,et al.  An Empirical Study of Hyper-Threading in High-Performance Computing Clusters , 2002 .

[3]  Nathan Regola,et al.  Recommendations for Virtualization Technologies in High Performance Computing , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[4]  Alexandra Fedorova,et al.  In search for contention-descriptive metrics in HPC cluster environment , 2011, ICPE '11.

[5]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[6]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[7]  Hong Ong,et al.  An Analysis of HPC Benchmarks in Virtual Machine Environments , 2009, Euro-Par Workshops.

[8]  Gabriel H. Loh,et al.  Dynamic Classification of Program Memory Behaviors in CMPs , 2008 .

[9]  Charles Shubert,et al.  StarHPC — Teaching parallel programming within elastic compute cloud , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[10]  Simon Fraser User-level scheduling on NUMA multicore systems under Linux , 2011 .

[11]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[12]  Shantenu Jha,et al.  Exploring the Performance Fluctuations of HPC Workloads on Clouds , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[13]  Christopher Stanton,et al.  A Study of Hyper-Threading in High-Performance Computing Clusters , 2002 .

[14]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Mark J. Clement,et al.  Core Algorithms of the Maui Scheduler , 2001, JSSPP.