Unified and lightweight tasks and conduits: A high level parallel programming framework

Computing platforms for high performance and parallel applications have changed rapidly during the past few years, from single to multiple cores, and from traditional Central Processing Units (CPUs) to hybrid systems which combine CPUs with accelerators such as Graphics Processing Units(GPUs), Intel Xeon Phi, etc. These developments bring more and more challenges to application developers, especially to maintain a high performance application across various platforms. To reduce development effort and improve portability of applications, we propose a high level parallel programming framework called Unified Tasks and Conduits (UTC). An application represented as multiple tasks can be mapped and run on available computing devices of the target platform in parallel. Through this framework, higher level application program structure is independent from lower level task execution, making it easy to port applications to different platforms. To support the framework, we have implemented a lightweight and flexible runtime system prototype, providing a set of utilities for users to conveniently create parallel applications to run on multicore systems or clusters. Our micro-benchmarks show that the runtime system introduces little overhead to implement tasks and conduits. Application tests show that UTC based programs can make use of available computing resources to do parallel processing efficiently while easing the job of the developer.

[1]  Pavan Balaji Intel Threading Building Blocks , 2015 .

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[4]  Rajeev Thakur,et al.  Enabling MPI interoperability through flexible communication endpoints , 2013, EuroMPI.

[5]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[6]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[7]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[8]  James Dinan,et al.  Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  M. Leeser,et al.  Heterogeneous tasks and conduits framework for rapid application portability and deployment , 2012, 2012 Innovative Parallel Computing (InPar).

[10]  Phil Rogers,et al.  Heterogeneous system architecture overview , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[11]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[12]  Bronis R. de Supinski,et al.  Early Experiences with the OpenMP Accelerator Model , 2013, IWOMP.

[13]  Chuck Pheatt,et al.  Intel® threading building blocks , 2008 .

[14]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[15]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[16]  Greg Stitt,et al.  Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing , 2010, LCTES '10.

[17]  J. Brock Adding support for GPUs to PVTOL: The Parallel Vector Tile Optimizing Library , 2010 .