Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches.

[1]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Jun Kong,et al.  An Integrative Approach for In Silico Glioma Research , 2010, IEEE Transactions on Biomedical Engineering.

[3]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[4]  Joel H. Saltz,et al.  Pathological Image Analysis Using the GPU: Stroma Classification for Neuroblastoma , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[5]  Anand Raghunathan,et al.  A framework for efficient and scalable execution of domain-specific templates on GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  Chun Chen,et al.  A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[7]  Tarek S. Abdelrahman,et al.  hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.

[8]  Ümit V. Çatalyürek,et al.  Run-time optimizations for replicated dataflows on heterogeneous environments , 2010, HPDC '10.

[9]  Gagan Agrawal,et al.  Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.

[10]  Gregory Diamos,et al.  Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.

[11]  Deendayal Dinakarpandian,et al.  A New Metric to Measure Gene Product Similarity , 2007, BIBM.

[12]  Karsten Schwan,et al.  Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community , 2011, Computing in Science & Engineering.

[13]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[14]  George Cybenko,et al.  Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..

[15]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[16]  Todd D. Millstein,et al.  Practical predicate dispatch , 2004, OOPSLA.

[17]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[18]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[19]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[20]  Til Aach,et al.  Challenges of medical image processing , 2011, Computer Science - Research and Development.

[21]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[22]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Bingsheng He,et al.  Mars: Accelerating MapReduce with Graphics Processors , 2011, IEEE Transactions on Parallel and Distributed Systems.

[24]  Seif Haridi,et al.  Proceedings of the First International Euro-Par Conference on Parallel Processing , 1995 .

[25]  Martin Cadík,et al.  FFT and Convolution Performance in Image Filtering on GPU , 2006, Tenth International Conference on Information Visualisation (IV'06).

[26]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.