Enabling GPU and Many-Core Systems in Heterogeneous HPC Environments Using Memory Considerations

Increasing the utilization of many-core systems has been one of the forefront topics these last years. Although many-cores architectures were merely theoretical models few years ago, they have become an important part of the high performance computing market. The semiconductor industry has developed Graphical Processing Units (GPU) systems that provide access to many cores (i.e: Larrabee, Fermi or Tesla) that can be used for General Purpose (GP) computing. In this paper, we propose and evaluate a scheduling strategy for GPU and many-core architectures for HPC environments. Specifically, our strategy is a variant of the backfilling scheduling policy with resource sharing considerations. We propose a scheduling strategy that considers the differences between GP processors and GPU computing elements in terms of computational capacity and memory bandwidth. To do this, our approach uses a resource model that predicts how shared resources are used in both GP and GPU/many-core elements. Furthermore, it considers the differences between these elements in terms of performance. First, it models their differences in terms of computational power and how they share the access to the node's memory bandwidth. Second, it characterizes how the processes are allocated to the GPU. Using this resource model, we design the Power Aware resource selection policy, which we combine with the Less Consume scheduling policy. Our strategy tries to allocate jobs aiming at reducing the memory contention and the energy consumption. Results show that the scheduling strategies proposed in this work are able to save over 40\% of energy and improve the system performance up to 30\% with respect to traditional backfilling strategies.

[1]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[2]  Bill Nitzberg,et al.  A comparison of workload traces from two production parallel machines , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[3]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[4]  Andrea C. Arpaci-Dusseau,et al.  The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance , 2002, JSSPP.

[5]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[6]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[7]  Bobby Bodenheimer,et al.  Synthesis and evaluation of linear motion transitions , 2008, TOGS.

[8]  Bruce Jacob,et al.  Instruction-level power dissipation in the Intel XScale embedded microprocessor , 2005, IS&T/SPIE Electronic Imaging.

[9]  Kenneth C. Sevcik,et al.  Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[10]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[11]  Evgenia Smirni,et al.  Power-aware resource allocation in high-end systems via online simulation , 2005, ICS '05.

[12]  Feng Qiu,et al.  Zippy: A Framework for Computation and Visualization on a GPU Cluster , 2008, Comput. Graph. Forum.

[13]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[14]  Rajkumar Buyya,et al.  Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[15]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[16]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[17]  J. Corbalan,et al.  Resource Sharing Usage Aware Resource Selection Policies for Backfilling Strategies , 2008 .

[18]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[19]  Jesús Labarta,et al.  Modeling the Impact of Resource Sharing in Backfilling Policies using the Alvio Simulator , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[20]  Allen B. Downey A parallel workload model and its implications for processor allocation , 2004, Cluster Computing.

[21]  José González,et al.  Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Tajana Rosing,et al.  Analysis of dynamic voltage scaling for system level energy management , 2008, CLUSTER 2008.

[23]  Frank Bellosa,et al.  Memory-aware Scheduling for Energy Efficiency on Multicore Processors , 2008, HotPower.

[24]  Ivan Rodero,et al.  The Resource Usage Aware Backfilling , 2009, JSSPP.

[25]  Volodymyr Kindratenko,et al.  QP: A Heterogeneous Multi-Accelerator Cluster , 2011 .

[26]  Francine Berman,et al.  A model for moldable supercomputer jobs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[27]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[28]  Dan Tsafrir,et al.  Instability in parallel job scheduling simulation: the role of workload flurries , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[29]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[30]  Pradeep Dubey,et al.  Larrabee: A Many-Core x86 Architecture for Visual Computing , 2009, IEEE Micro.

[31]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[32]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[33]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[34]  Mariacarla Calzarossa,et al.  Workload Characterization Issues and Methodologies , 2000, Performance Evaluation.

[35]  Sally A. McKee,et al.  An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.

[36]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[37]  Satoshi Matsuoka,et al.  Power-aware dynamic task scheduling for heterogeneous accelerated clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[38]  Dan Tsafrir,et al.  Backfilling Using Runtime Predictions Rather Than User Estimates , 2005 .

[39]  Günter Haring,et al.  A hierarchical approach to workload characterization for parallel systems , 1995, HPCN Europe.