A Methodology to Develop High Performance Applications on GPGPU Architectures: Application to Simulation of Electrical Machines. (Une Méthodologie pour le Développement d'Applications Hautes Performances sur des Architectures GPGPU: Application à la Simulation des Machines Éléctriques)

Complex physical phenomena can be numerically simulated by mathematical techniques. Usually, these techniques are based on discretization of partial differential equations that govern these phenomena. Hence, these simulations enable the solution of large-scale systems. The parallelization of algorithms of numerical simulation, i. e., their adaptation to parallel processing architectures, is an aim to reach in order to hinder exorbitant execution times. The parallelism has been imposed at the level of processor architectures and graphics cards are now used for purposes of general calculation, also known as "General-Purpose computation on Graphics Processing Unit (GPGPU)". The clear benefit is the excellent performance/price ratio. This thesis addresses the design of high-performance applications for simulation of electrical machines. We provide a methodology based on Model Driven Engineering (MDE) to model an application and its execution architecture in order to generate OpenCL code. Our goal is to assist specialists in algorithms of numerical simulations to create a code that runs efficiently on GPGPU architectures. To ensure this, we offer a compilation model chain that takes into account several aspects of the OpenCL programming model. In addition, to get a code fairly efficient compared to a code developed manually, we provide model transformations that analyze some levels of optimizations based on the characteristics of the architecture (e. g. memory issues). As an experimental validation, the methodology is applied to the creation of an application that solves a linear system resulting from the Finite Element Method (FEM) for simulation of electrical machines. In this case, we show, among other things, the ability of the methodology of scaling by a simple modification of the number of available GPU devices.

[1]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[2]  Adolf Samir Abdallah Conception de SoC à Base d'Horloges Abstraites : Vers l'Exploration d'Architectures en MARTE. (Clock Based SoC Design, Towards a Design Space Exploration in MARTE) , 2011 .

[3]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[4]  Markus Schordan,et al.  Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  Manuel V. Hermenegildo,et al.  Non-strict independence-based program parallelization using sharing and freeness information , 2009, Theor. Comput. Sci..

[6]  Steve Furber ARM System-on-Chip Architecture , 2000 .

[7]  C. Davis,et al.  Harnessing Green IT: Principles and Practices , 2012 .

[8]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[9]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[10]  Richard C. Gronback Eclipse Modeling Project: A Domain-Specific Language (DSL) Toolkit , 2009 .

[11]  C. Dodd,et al.  ANALYTICAL SOLUTIONS TO EDDY--CURRENT PROBE COIL PROBLEMS. , 1968 .

[12]  Emmanuel Cagniot Algorithmes data-parallèles irréguliers appliqués à la simulation électromagnétique tridimentionnelle , 2000 .

[13]  Frédéric Guyomarc'h,et al.  Component-based Models Going Generic : the MARTE Case-Study , 2008 .

[14]  N. Ida,et al.  Electromagnetics and calculation of fields , 1992 .

[15]  J. P. Gregoire,et al.  Direct and iterative solvers for finite-element problems , 2004, Numerical Algorithms.

[16]  Sébastien Le Beux,et al.  Un flot de conception pour applications de traitement du signal systématique implémentées sur FPGA à base d'Ingénierie Dirigée par les Modèles. (A Model Driven Engineering based design flow for systematic signal processing applications implemented on FPGA) , 2007 .

[17]  Jun Gu,et al.  Efficient Local Search for DAG Scheduling , 2001, IEEE Trans. Parallel Distributed Syst..

[18]  Thorsten Grotker,et al.  System Design with SystemC , 2002 .

[19]  Anne Etien,et al.  Fine Grained Traceability for an MDE Approach of Embedded System Conception , 2008 .

[20]  Pierre Boulet,et al.  Synchronous Modeling and Analysis of Data Intensive Applications , 2008, EURASIP J. Embed. Syst..

[21]  Sven-Bodo Scholz,et al.  Harnessing the Power of GPUs without Losing Abstractions in SAC and ArrayOL: A Comparative Study , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[22]  Peter A. Fritzson,et al.  Principles of object-oriented modeling and simulation with Modelica 2.1 , 2004 .

[23]  Imran Rafiq Quadri,et al.  MARTE based model driven design methodology for targeting dynamically reconfigurable FPGA based SoCs , 2010 .

[24]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[25]  Nicolas Pinto,et al.  PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation , 2009, Parallel Comput..

[26]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[27]  Jean-Luc Dekeyser,et al.  Traceability Mechanism for Error Localization in Model Transformation , 2009, ICSOFT.

[28]  Jon Oldevik,et al.  Scenarios of Traceability in Model to Text Transformations , 2007, ECMDA-FA.

[29]  Karl Meerbergen,et al.  C++ Bindings to External Software Libraries with Examples from BLAS, LAPACK, UMFPACK, and MUMPS , 2009, TOMS.

[30]  Jean-Luc Dekeyser,et al.  A Design Flow to Map Parallel Applications onto FPGAs , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[31]  A. Greenbaum Estimating the Attainable Accuracy of Recursively Computed Residual Methods , 1997, SIAM J. Matrix Anal. Appl..

[32]  Michio Kaku Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 , 2012 .

[33]  Pierre Boulet,et al.  Formal Semantics of Array-OL, a Domain Specific Language for Intensive Multidimensional Signal Processing , 2008 .

[34]  L. Kantorovich,et al.  Approximate methods of higher analysis , 1960 .

[35]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[36]  Timothy G. Mattson,et al.  Parallel programming: Can we PLEASE get it right this time? , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[37]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[38]  Frédéric Guyomarc'h,et al.  A Graphical Framework for High Performance Computing Using An MDE Approach , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[39]  T. Theodoulidis,et al.  Analytical model for tilted coils in eddy-current nondestructive inspection , 2005, IEEE Transactions on Magnetics.

[40]  Ieee Antennas,et al.  Electromagnetics: History, Theory, and Applications , 1993 .

[41]  Theodoros Theodoulidis,et al.  Model of ferrite-cored probes for eddy current nondestructive evaluation , 2003 .

[42]  Rabie Ben Atitallah,et al.  Multilevel MPSoC Performance Evaluation Using MDE Approach , 2006, 2006 International Symposium on System-on-Chip.

[43]  Ken Kennedy,et al.  The rise and fall of High Performance Fortran: an historical object lesson , 2007, HOPL.

[44]  Pierre Boulet,et al.  Repetitive model refactoring strategy for the design space exploration of intensive signal processing applications , 2011, J. Syst. Archit..

[45]  Philippe Dumont Spécification multidimensionnelle pour le traitement du signal systématique , 2005 .

[46]  Robert Schreiber,et al.  Block Algorithms for Parallel Machines , 1988 .

[47]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[48]  John W. Eaton,et al.  GNU Octave Manual Version 3 , 2008 .

[49]  Timo Euler Consistent discretization of maxwell's equations on polyhedral grids , 2007 .

[50]  John Lane,et al.  IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries , 1991 .

[51]  Frédéric Guyomarc'h,et al.  An MDE Approach for Automatic Code Generation from UML/MARTE to OpenCL , 2013, Computing in Science & Engineering.

[52]  Layne T. Watson,et al.  Toward parallel mathematical software for elliptic partial differential equations , 1993, TOMS.

[53]  Éric Piel Ordonnancement de systèmes parallèles temps réel : de la modélisation à la mise en oeuvre par l'ingénierie dirigée par les modèles , 2007 .

[54]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[55]  Sinan Si Alhir,et al.  Guide to Applying the UML , 2002, Springer Professional Computing.

[56]  Peter J. Ashenden,et al.  The Designer's Guide to VHDL , 1995 .

[57]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[58]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[59]  A. I. Cohen Rate of convergence of several conjugate gradient algorithms. , 1972 .

[60]  J. Maxwell A Treatise on Electricity and Magnetism , 1873, Nature.

[61]  Philipp Huber The Model Transformation Language Jungle - An Evaluation and Extension of Existing Approaches , 2008 .

[62]  Frédéric Guyomarc'h,et al.  Using ArrayOL to Identify Potentially Shareable Data in Thread Work-Groups of GPUs , 2011, ParCo 2011.

[63]  Jozef Hooman,et al.  COUPLING SIMULINK AND UML MODELS , .

[64]  Jean-Luc Dekeyser,et al.  Traceability for Mutation Analysis in Model Transformation , 2010, MoDELS.

[65]  Frédéric Guyomarc'h,et al.  Automatic Multi-GPU Code Generation Applied to Simulation of Electrical Machines , 2011, IEEE Transactions on Magnetics.

[66]  Scott B. Baden,et al.  Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[67]  Jacques Chassin de Kergommeaux,et al.  Parallel logic programming systems , 1994, CSUR.

[68]  Ulrich Rüde,et al.  Modeling Multigrid Algorithms for Variational Imaging , 2010, 2010 21st Australian Software Engineering Conference.

[69]  Satoshi Matsuoka,et al.  High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning , 2010, Computer Science - Research and Development.

[70]  Jean-Luc Dekeyser,et al.  Using an Alternative Trace for QVT , 2011 .

[71]  Hiroshi Okuda,et al.  Conjugate Gradients on Graphic Hardware : Performance & Feasibility , 2008 .

[72]  Zdenek Strakos,et al.  Accuracy of Two Three-term and Three Two-term Recurrences for Krylov Space Solvers , 2000, SIAM J. Matrix Anal. Appl..

[73]  Jean Bézivin,et al.  On the unification power of models , 2005, Software & Systems Modeling.

[74]  Frédéric Guyomarc'h,et al.  Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines , 2009, ArXiv.

[75]  Rabie Ben Atitallah Modèles et simulation des systèmes sur puce multiprocesseurs : estimation des performances et de la consommation d'énergie , 2008 .

[76]  Julien Taillard Une approche orientée modèle pour la parallélisation d'un code de calcul éléments finis , 2009 .

[77]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[78]  Ep-Atr Signal: A formal design environment for real-time systems , 1995 .

[79]  Stuart Kent,et al.  Model Driven Engineering , 2002, IFM.

[80]  Pierre Boulet,et al.  Array-OL Revisited, Multidimensional Intensive Signal Processing Specification , 2007 .

[81]  P. Sadayappan,et al.  Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[82]  E. de Jong,et al.  High-level specification tools for parallel application development , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[83]  Jean Bézivin,et al.  ATL: A model transformation tool , 2008, Sci. Comput. Program..

[84]  Martin Berzins,et al.  New NAG library software for first-order partial differential equations , 1994, TOMS.

[85]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[86]  Alfred Strey,et al.  Performance Analysis of Intel's MMX and SSE: A Case Study , 2001, Euro-Par.

[87]  William Gropp,et al.  Beowulf Cluster Computing with Linux , 2003 .

[88]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[89]  Jean-Luc Dekeyser,et al.  A Model-Driven Design Framework for Massively Parallel Embedded Systems , 2011, TECS.

[90]  Sven-Bodo Scholz,et al.  WITH-Loop-Folding in SAC - Condensing Consecutive Array Operations , 1997, Implementation of Functional Languages.

[91]  J. Bastos,et al.  Electromagnetic Modeling by Finite Element Methods , 2003 .

[92]  Jean-Yves L'Excellent,et al.  Some Experiments and Issues to Exploit Multicore Parallelism in a Distributed-Memory Parallel Sparse Direct Solver , 2010 .

[93]  Frédéric Guyomarc'h,et al.  Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications , 2011 .

[94]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[95]  Frédéric Guyomarc'h,et al.  Programming Massively Parallel Architectures using MARTE: a Case Study , 2011, ArXiv.

[96]  David Lugato Model-driven engineering for high-performance computing applications , 2008 .

[97]  Gordon E. Moore,et al.  Progress in digital integrated electronics , 1975 .

[98]  Nicolas Halbwachs,et al.  LUSTRE: a declarative language for real-time programming , 1987, POPL '87.

[99]  Huafeng Yu,et al.  A MARTE-Based Reactive Model for Data-Parallel Intensive Processing: Transformation Toward the Synchronous Model. (Un Modèle Réactif Basé sur MARTE Dédié au Calcul Intensif à Parallélisme de Données : Transformation vers le Modèle Synchrone) , 2008 .

[100]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[101]  Frédéric Guyomarc'h,et al.  A Modeling Approach based on UML/MARTE for GPU Architecture , 2011, ArXiv.

[102]  Siegfried Benkner Optimizing irregular HPF applications using halos , 2000 .