Heterogeneous tasks and conduits framework for rapid application portability and deployment

Emerging heterogeneous and homogeneous processing architectures demonstrate significant increases in throughput for scientific applications over traditional single core processors. Each of these processing architectures vary widely in their processing capabilities, memory hierarchies, and programming models. Determining the system architecture best suited to an application or deploying an application that is portable across a number of different platforms is increasingly complex and error prone within this rapidly increasing and evolving design space. Quickly and easily designing portable, high-performance applications that can function and maintain their correctness properly across these widely varied systems has become paramount. To deal with these programming challenges, there is a great need for new models and tools to be developed. One example is MIT Lincoln Laboratory's Parallel Vector Tile Optimizing Library (PVTOL) which simplifies the task of developing software in C++ for these complex systems. This work extends the Tasks and Conduits framework in PVTOL to support GPU architectures and other heterogeneous platforms supported by the NVIDIA CUDA and OpenCL programming models. This allows the rapid portability of applications to a very wide range of architectures and clusters. Using this framework, porting applications from a single CPU core to a GPU requires a change of only 5 source lines of code (SLOC) in addition to the CUDA or OpenCL kernel. Using GPU-PVTOL we have achieved 22x speedup in an application of Monte Carlo simulations of photon propagation through a biological medium, and a 60x speedup of a 3D cone beam computed tomography (CT) image reconstruction algorithm.

[1]  N. Bliss,et al.  PVTOL: Providing Productivity, Performance and Portability to DoD Signal Processing Applications on Multicore Processors , 2008, 2008 DoD HPCMP Users Group Conference.

[2]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[3]  Mark A. Franklin,et al.  Auto-Pipe: Streaming Applications on Architecturally Diverse Systems , 2010, Computer.

[4]  Alan D. George,et al.  Bridging parallel and reconfigurable computing with multilevel PGAS and SHMEM+ , 2009, HPRCTA '09.

[5]  Vasilis Ntziachristos,et al.  Early photon tomography allows fluorescence detection of lung carcinomas and disease progression in mice in vivo , 2008, Proceedings of the National Academy of Sciences.

[6]  Galen C. Hunt,et al.  Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.

[7]  S. Mohindra,et al.  Task and Conduit Framework for Multi-core Systems , 2008, 2008 DoD HPCMP Users Group Conference.

[8]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[9]  InoFumihiko,et al.  High-performance cone beam reconstruction using CUDA compatible GPUs , 2010 .

[10]  L Wang,et al.  MCML--Monte Carlo modeling of light transport in multi-layered tissues. , 1995, Computer methods and programs in biomedicine.

[11]  David A Boas,et al.  Monte Carlo simulation of photon migration in 3D turbid media accelerated by graphics processing units. , 2009, Optics express.

[12]  L. Feldkamp,et al.  Practical cone-beam algorithm , 1984 .

[13]  Peter B. Noël,et al.  GPU-based cone beam computed tomography , 2010, Comput. Methods Programs Biomed..

[14]  Lihong V. Wang,et al.  Biomedical Optics: Principles and Imaging , 2007 .

[15]  Satnam Singh Computing without processors , 2012, CODES+ISSS '12.

[16]  Li Shang,et al.  A platform for developing adaptable multicore applications , 2009, CASES '09.

[17]  Andreas H Hielscher,et al.  Optical tomographic imaging of small animals. , 2005, Current opinion in biotechnology.

[18]  Andrew Lumsdaine,et al.  PFunc: modern task parallelism for modern high performance computing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[19]  Alan D. George,et al.  SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems , 2009, HPRCTA '09.

[20]  J. Brock Adding support for GPUs to PVTOL: The Parallel Vector Tile Optimizing Library , 2010 .