Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms

Heterogeneous platforms integrating different processors like GPUs and multi-core CPUs become popular in high performance computing. While most applications are currently using the homogeneous parts of these platforms, we argue that there is a large class of applications that can benefit from their heterogeneity: massively parallel imbalanced applications. Such applications emerge, for example, from variable time step based numerical methods and simulations. In this paper, we present Glinda, a framework for accelerating imbalanced applications on heterogeneous computing platforms. Our framework is able to correctly detect the application workload characteristics, make choices based on the available parallel solutions and hardware configuration, and automatically obtain the optimal workload decomposition and distribution. Our experiments on parallelizing a heavily imbalanced acoustic ray tracing application show that Glinda improves application performance in multiple scenarios, achieving up to 12x speedup against manually configured parallel solutions.

[1]  Bixia Zheng,et al.  Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[3]  Manolis Papadrakakis,et al.  A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures , 2011 .

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[6]  Sungdae Cho,et al.  Design and Performance Evaluation of Image Processing Algorithms on GPUs , 2011, IEEE Transactions on Parallel and Distributed Systems.

[7]  Jérémie Allard,et al.  Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.

[8]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[9]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[10]  Richard W. Vuduc,et al.  Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.

[11]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[12]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[13]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[14]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[16]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Hendrikus G. Visser,et al.  A framework for simulation of aircraft flyover noise through a non-standard atmosphere , 2012 .

[18]  Jack J. Dongarra,et al.  Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..

[19]  Satoshi Matsuoka,et al.  Power-aware dynamic task scheduling for heterogeneous accelerated clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Stephen A. Rizzi,et al.  Synthesis of Virtual Environments for Aircraft Community Noise Impact Studies , 2005 .

[21]  Jie Shen,et al.  Performance Gaps between OpenMP and OpenCL for Multi-core CPUs , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[22]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[23]  Yang Xunren,et al.  Computational atmospheric acoustics , 1997 .

[24]  Mateus Pelegrino,et al.  Techniques for designing GPGPU games , 2012, 2012 IEEE International Games Innovation Conference.