Automatic Software Synthesis from High-Level ForSyDe Models Targeting Massively Parallel Processors

In the past decade we have witnessed an abrupt shift to parallel computing subsequent to the increasing demand for performance and functionality that can no longer be satisfied by conventional paradigms. As a consequence, the abstraction gab between the applications and the underlying hardware increased, triggering both industry and academia in several research directions.This thesis project aims at analyzing some of these directions in order to offer a solution for bridging the abstraction gap between the description of a problem at a functional level and the implementation on a heterogeneous parallel platform using ForSyDe – a formal design methodology. This report treats applications employing data-parallel and time-parallel computation, regards nvidia CUDA-enabled GPGPUs as the main backend platform. The report proposes a heuristic transformation-and-refinement process based on analysis methods and design decisions to automate and aid in a correct-by-design backend code synthesis.Its purpose is to identify potential data parallelism and time parallelism in a high-level system. Furthermore, based on a basic platform model, the algorithm load-balances and maps the execution onto the best computation resources in an automated design flow. This design flow will be embedded into an already existing tool, f2cc (ForSyDe-to-CUDA C) and tested for correctness on an industrial-scale image processing application aimed at monitoring inkjet print-heads reliability.

[1]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[2]  Abhishek Udupa,et al.  Software Pipelined Execution of Stream Programs on GPUs , 2009, 2009 International Symposium on Code Generation and Optimization.

[3]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[4]  Johnny Öberg,et al.  Revolver: a high-performance MIMD architecture for collision free computing , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).

[5]  Bo Joel Svensson,et al.  GPGPU kernel implementation and refinement using Obsidian , 2010, ICCS.

[6]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[7]  Karol Miller,et al.  Real-Time Nonlinear Finite Element Computations on GPU - Application to Neurosurgical Simulation. , 2010, Computer methods in applied mechanics and engineering.

[8]  Gheorghe Stefan,et al.  Integral Parallel Architecture & Berkeley's Motifs , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[9]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[10]  Gabriel Hjort Blindell Synthesizing Software from a ForSyDe Model Targeting GPGPUs , 2012 .

[11]  S. Kleene General recursive functions of natural numbers , 1936 .

[12]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[13]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[14]  Axel Jantsch,et al.  Energy efficient streaming applications with guaranteed throughput on MPSoCs , 2008, EMSOFT '08.

[15]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[16]  Sergei Gorlatch,et al.  Using the SkelCL Library for High-Level GPU Programming of 2D Applications , 2012, Euro-Par Workshops.

[17]  Seyed-Hosein Attarzadeh-Niaki,et al.  Formal heterogeneous system modeling with SystemC , 2012, Proceeding of the 2012 Forum on Specification and Design Languages.

[18]  Edwin J. Beggs,et al.  Embedding infinitely parallel computation in Newtonian kinematics , 2006, Appl. Math. Comput..

[19]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[20]  Ingo Sander,et al.  System level modelling with open source tools , 2011 .

[21]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[22]  Ralph E. Johnson,et al.  Design Patterns: Abstraction and Reuse of Object-Oriented Design , 1993, ECOOP.

[23]  Radu Hobincu,et al.  Performance gain from data and control dependency elimination in embedded processors , 2010, 2010 9th International Symposium on Electronics and Telecommunications.

[24]  Jeffrey C. Carver,et al.  Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[25]  Leo A. Meyerovich,et al.  Parallelizing the web browser , 2009 .

[26]  Gérard Berry,et al.  The Esterel Synchronous Programming Language: Design, Semantics, Implementation , 1992, Sci. Comput. Program..

[27]  Alberto L. Sangiovanni-Vincentelli,et al.  System-level design: orthogonalization of concerns andplatform-based design , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[28]  Jason Sanders,et al.  CUDA by example: an introduction to general purpose GPU programming , 2010 .

[29]  Stephen Travis Pope,et al.  A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk-80 System , 1998 .

[30]  Edward A. Lee,et al.  Comparing models of computation , 1996, ICCAD 1996.

[31]  M. Stoian,et al.  Complex vs. Intensive in Parallel Computation , 2006, 2006 International Multi-Conference on Computing in the Global Information Technology - (ICCGI'06).

[32]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[33]  Axel Jantsch,et al.  System modeling and transformational design refinement in ForSyDe [formal system design] , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[35]  John P. Hayes,et al.  Architecture of a Hypercube Supercomputer , 1986, ICPP.

[36]  Uday Bondhugula,et al.  Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.

[37]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.

[38]  Chi-Jen Lu,et al.  On the Parallel Computation of the Algebraic Path Problem , 1992, IEEE Trans. Parallel Distributed Syst..

[39]  Gheorghe Stefan,et al.  On the Many-Processor Paradigm , 2008, PDPTA.

[40]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[41]  BerryGérard,et al.  The ESTEREL synchronous programming language , 1992 .

[42]  Axel Jantsch,et al.  System synthesis based on a formal computational model and skeletons , 1999, Proceedings. IEEE Computer Society Workshop on VLSI '99. System Design: Towards System-on-a-Chip Paradigm.

[43]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[44]  Florence Maraninchi,et al.  The Argos Language: Graphical Representation of Automata and Description of Reactive Systems , 2007 .

[45]  Ingo Sander,et al.  System Modeling and Design Refinement in ForSyDe , 2003 .