Synthesizing Software from a ForSyDe Model Targeting GPGPUs

Today, a plethora of parallel execution platforms are available. One platform in particular is the GPGPU – a massively parallel architecture designed for exploiting data parallelism. However, GPGPUS are notoriously difficult to program due to the way data is accessed and processed, and many interconnected factors affect the performance. This makes it an exceptionally challengingtask to write correct and high-performing applications for GPGPUS. This thesis project aims to address this problem by investigating how ForSyDe models – a software engineering methodology where applications are modeled at a very high level of abstraction – can be synthesized into CUDA C code for execution on NVIDIA CUDA-enabled graphics cards. The report proposes a software synthesis process which discovers one type of potential data parallelism in a model and generates either pure C or CUDA C code. A prototype of the software synthesis component has also been implemented and tested on models derived from two applications – a Mandelbrot generator and an industrial-scale image processor. The synthesized CUDA code produced in the tests was shown to be both correct and efficient, provided there was enough computation complexity in the processes to amortize the overhead cost of using the GPGPU.

[1]  Wen-mei W. Hwu,et al.  Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.

[2]  Ingo Sander,et al.  System Modeling and Design Refinement in ForSyDe , 2003 .

[3]  Mary Sheeran Hardware Design and Functional Programming: a Perfect Match , 2005, J. Univers. Comput. Sci..

[4]  Albert Benveniste,et al.  The synchronous approach to reactive and real-time systems , 1991 .

[5]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[6]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.

[7]  Bo Joel Svensson,et al.  GPGPU kernel implementation and refinement using Obsidian , 2010, ICCS.

[8]  Ulrik Brandes,et al.  GraphML Progress Report , 2001, GD.

[9]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[10]  Bryan O'Sullivan,et al.  Real World Haskell , 2008 .

[11]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[12]  Uday Bondhugula,et al.  Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.

[13]  Kimberly Abts A ’Hands-On’ Approach , 2012 .

[14]  Wu-chun Feng,et al.  To GPU synchronize or not GPU synchronize? , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[15]  Christoph W. Kessler,et al.  Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems , 2011, IWMSE '11.

[16]  Stephen A. Edwards,et al.  Design of embedded systems: formal models, validation, and synthesis , 1997, Proc. IEEE.

[17]  Axel Jantsch,et al.  System synthesis based on a formal computational model and skeletons , 1999, Proceedings. IEEE Computer Society Workshop on VLSI '99. System Design: Towards System-on-a-Chip Paradigm.

[18]  Simon L. Peyton Jones,et al.  Harnessing the Multicores: Nested Data Parallelism in Haskell , 2008, FSTTCS.

[19]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[20]  Joel Bo Svensson Obsidian: GPU Kernel Programming in Haskell , 2011 .

[21]  Christoph W. Kessler,et al.  Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-based Systems , 2011, PARCO.

[22]  Christoph Kessler,et al.  Towards a Tunable Multi-Backend Skeleton Programming Framework for Multi-GPU Systems , 2012 .

[23]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[24]  Andrew W. Appel,et al.  Modern Compiler Implementation in Java , 1997 .

[25]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[26]  Becky Francis,et al.  A perfect match? Pupils’ and teachers’ views of the impact of matching educators and learners by gender , 2008 .

[27]  Simon L. Peyton Jones,et al.  Data parallel Haskell: a status report , 2007, DAMP '07.

[28]  Jack Donovan,et al.  SystemC: From the Ground Up , 2004 .

[29]  Alberto L. Sangiovanni-Vincentelli,et al.  System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[30]  Simon L. Peyton Jones,et al.  A Tutorial on Parallel and Concurrent Programming in Haskell , 2008, Advanced Functional Programming.

[31]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[32]  Axel Jantsch,et al.  System modeling and transformational design refinement in ForSyDe [formal system design] , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[33]  Wen-mei W. Hwu,et al.  Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..

[34]  Usman Dastgeer,et al.  Skeleton Programming for Heterogeneous GPU-based Systems , 2011 .