Contributions to the Design of Reliable and Programmable High-Performance Systems: Principles, Interfaces, Algorithms and Tools. (Contributions à la conception de systèmes à hautes performances, programmables et sûrs: principes, interfaces, algorithmes et outils)

Moore's law on semiconductors is coming to an end. Scaling the von Neumann architecture over the 40 years of the microprocessor has led to unsustainable circuit complexity, very low compute-density, and high power consumption. On the other hand, parallel computing practices are nowhere close to the portability, accessibility, productivity and reliability levels of single-threaded software engineering. This dangerous gap translates into exciting challenges for compilation and programming language research in high-performance, general purpose and embedded computing. This thesis motivates our approach to these challenges, introduces our main directions and results, and draws research perspectives.

[1]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[2]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[3]  Nils Klarlund,et al.  Graph types , 1993, POPL '93.

[4]  Paul Feautrier,et al.  Instancewise Array Dependence Test for Recursive Programs , 2003 .

[5]  Paul Feautrier The Data Parallel Programming Model , 1996, Lecture Notes in Computer Science.

[6]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[7]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[8]  Lieven Eeckhout,et al.  FOUR GENERATIONS OF SPEC CPU BENCHMARKS : WHAT HAS CHANGED AND WHAT HAS NOT , 2004 .

[9]  Nancy M. Amato,et al.  STAPL: An Adaptive, Generic Parallel C++ Library , 2001, LCPC.

[10]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[11]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[12]  Gerda Janssens,et al.  Multi-dimensional incremental loop fusion for data locality , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[13]  David Parello,et al.  On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Michael F. P. O'Boyle,et al.  OCEANS: Optimising Compilers for Embedded Applications , 1998, Euro-Par.

[15]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[16]  Vivek Sarkar,et al.  Array SSA form and its use in parallelization , 1998, POPL '98.

[17]  Albert Cohen,et al.  Maximal Static Expansion , 1998, POPL '98.

[18]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[20]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[21]  Marc Pouzet,et al.  Clocks as First Class Abstract Types , 2003, EMSOFT.

[22]  David Parello,et al.  Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[23]  Albert Cohen,et al.  Deep jam: conversion of coarse-grain parallelism to instruction-level and vector parallelism for irregular applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[24]  Albert Cohen,et al.  Violated dependence analysis , 2006, ICS '06.

[25]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[26]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[27]  Katherine A. Yelick,et al.  Evaluating support for global address space languages on the Cray X1 , 2004, ICS '04.

[28]  Paul Feautrier,et al.  Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..

[29]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[30]  François Pottier,et al.  Simplifying subtyping constraints , 1996, ICFP '96.

[31]  Martin Griebl,et al.  Generation of Synchronous Code for Automatic Parallelization of while Loops , 1995, Euro-Par.

[32]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[33]  David A. Padua,et al.  In search of a program generator to implement generic transformations for high-performance computing , 2006, Sci. Comput. Program..

[34]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[35]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[36]  Pascal Raymond,et al.  The synchronous data flow programming language LUSTRE , 1991, Proc. IEEE.

[37]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[38]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[39]  Paul Feautrier,et al.  Scalable and Structured Scheduling , 2006, International Journal of Parallel Programming.

[40]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[41]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[42]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[43]  Alain Darte,et al.  Loop Shifting for Loop Parallelization , 2000 .

[44]  François Bourdoncle,et al.  Abstract interpretation by dynamic partitioning , 1992, Journal of Functional Programming.

[45]  Alain Deutsch,et al.  Interprocedural may-alias analysis for pointers: beyond k-limiting , 1994, PLDI '94.

[46]  Andreas Podelski,et al.  Efficient algorithms for pre* and post* on interprocedural parallel flow graphs , 2000, POPL '00.

[47]  Albert Benveniste,et al.  programmi language and its , 2001 .

[48]  Paul Feautrier A Parallelization Framework for Recursive Tree Programs , 1998, Euro-Par.

[49]  David B. A. Epstein,et al.  Word processing in groups , 1992 .

[50]  Ajm Arno Moonen,et al.  Timing analysis model for network based multiprocessor systems. , 2004 .

[51]  Michael F. P. O'Boyle,et al.  Feedback Assisted Iterative Compilation , 2000 .

[52]  Lawrence Rauchwerger,et al.  Hybrid Analysis: Static & Dynamic Memory Reference Analysis , 2004, International Journal of Parallel Programming.

[53]  Keshav Pingali,et al.  Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.

[54]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[55]  D. Naishlos,et al.  Autovectorization in GCC , 2004 .

[56]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[57]  Sebastian Pop,et al.  The SSA Representation Framework: Semantics, Analyses and GCC Implementation , 2006 .

[58]  Jean Vuillemin,et al.  On Circuits and Numbers , 1994, IEEE Trans. Computers.

[59]  Jingling Xue Automating Non-Unimodular Loop Transformations for Massive Parallelism , 1994, Parallel Comput..

[60]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[61]  Larry Carter,et al.  Schedule-independent storage mapping for loops , 1998, ASPLOS VIII.

[62]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[63]  Grigori Fursin,et al.  A heuristic search algorithm based on unified transformation framework , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[64]  David A. Padua,et al.  A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[65]  Stephen A. Edwards,et al.  The synchronous languages 12 years later , 2003, Proc. IEEE.

[66]  Walid Taha,et al.  Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection , 2003, GPCE.

[67]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[68]  Ken Kennedy,et al.  Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization , 2001, ICS '01.

[69]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[70]  Albert Cohen,et al.  Induction Variable Analysis with Delayed Abstractions , 2005, HiPEAC.

[71]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[72]  Alexander Aiken,et al.  Type inclusion constraints and type inference , 1993, FPCA '93.

[73]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[74]  Scott A. Mahlke,et al.  Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[75]  Kunle Olukotun,et al.  Architectural Semantics for Practical Transactional Memory , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[76]  Michael D. Smith,et al.  Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.

[77]  Paul Feautrier,et al.  More Legal Transformations for Locality , 2004, Euro-Par.

[78]  Alexandru Nicolau,et al.  Abstractions for recursive pointer data structures: improving the analysis and transformation of imperative programs , 1992, PLDI '92.

[79]  Marc Pouzet,et al.  Synchronization of periodic clocks , 2005, EMSOFT.

[80]  Paul Feautrier,et al.  Fuzzy array dataflow analysis , 1995, PPOPP '95.

[81]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[82]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[83]  Sanjay V. Rajopadhye,et al.  Optimizing memory usage in the polyhedral model , 2000, TOPL.

[84]  T. Kisuki,et al.  Iterative Compilation in Program Optimization , 2000 .

[85]  Michael F. P. O'Boyle,et al.  Adaptive java optimisation using instance-based learning , 2004, ICS '04.

[86]  Vincent Loechner,et al.  Parameterized Polyhedra and Their Vertices , 1997, International Journal of Parallel Programming.

[87]  Chris Okasaki,et al.  Functional Data Structures , 1996, Handbook of Data Structures and Applications.

[88]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[89]  J. Isoaho,et al.  Interconnect and Memory Organization in SOCs for Advanced Set-Top Boxes and TV: Evolutin, Analysis and Trends , 2005 .

[90]  Frédéric Vivien,et al.  A unified framework for schedule and storage optimization , 2001, PLDI '01.

[91]  Paul Feautrier,et al.  Improving Data Locality by Chunking , 2003, CC.

[92]  Patrick Cousot,et al.  Program analysis: the abstract interpretation perspective , 1996, CSUR.

[93]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[94]  Jacques Sakarovitch,et al.  On the Representation of Finite Deterministic 2-Tape Automata , 1999, Theor. Comput. Sci..

[95]  P. Feautrier Parametric integer programming , 1988 .

[96]  David A. Padua,et al.  A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.

[97]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[98]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[99]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[100]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[101]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[102]  John McCarthy,et al.  Mathematical Theory of Computation , 1991 .

[103]  Neil D. Jones,et al.  Program flow analysis - theory and applications , 1981, Prentice Hall software series.

[104]  Pascal Fradet,et al.  Shape types , 1997, POPL '97.

[105]  Cédric Bastoul,et al.  Efficient code generation for automatic parallelization and optimization , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..

[106]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[107]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[108]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[109]  Emden R. Gansner,et al.  Drawing graphs with dot , 2006 .

[110]  Raymond Lo,et al.  Loop induction variable canonicalization in parallelizing compilers , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[111]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[112]  Robert A. van Engelen,et al.  Efficient Symbolic Analysis for Optimizing Compilers , 2001, CC.

[113]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[114]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[115]  Eelco Visser,et al.  Stratego: A Language for Program Transformation Based on Rewriting Strategies , 2001, RTA.

[116]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[117]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[118]  Josep Llosa,et al.  Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[119]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[120]  Jens Knoop,et al.  An Automata-Theoretic Approach to Interprocedural Data-Flow Analysis , 1999, FoSSaCS.

[121]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[122]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[123]  Philip H. Sweany,et al.  Improving software pipelining with unroll-and-jam , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[124]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[125]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[126]  Albert Cohen,et al.  GRAPHITE: Loop Optimizations Based on the Polyhedral Model for GCC , 2006 .

[127]  Denis Barthou,et al.  Array Dataflow Analysis in Presence of Non-affine Constraints , 1998 .

[128]  Edward A. Lee,et al.  Ptolemy: A Framework for Simulating and Prototyping Heterogenous Systems , 2001, Int. J. Comput. Simul..

[129]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[130]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[131]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[132]  David A. Padua,et al.  Monotonic evolution: an alternative to induction variable substitution for dependence analysis , 2001, ICS '01.

[133]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[134]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[135]  Albert Cohen,et al.  A Polyhedral Approach to Ease the Composition of Program Transformations , 2004, Euro-Par.

[136]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[137]  Albert Cohen Program Analysis and Transformation: From the Polytope Model to Formal Languages. (Analyse et transformation de programmes: du modèle polyédrique aux langages formels) , 1999 .

[138]  Dennis Gannon,et al.  Active Libraries: Rethinking the roles of compilers and libraries , 1998, ArXiv.

[139]  D. Zhang,et al.  The value evolution graph and its use in memory reference analysis , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[140]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[141]  Michael F. P. O'Boyle,et al.  The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..

[142]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[143]  Michael F. P. O'Boyle,et al.  MARS: A Distributed Memory Approach to Shared Memory Compilation , 1998, LCR.

[144]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[145]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[146]  Doran Wilde,et al.  Scheduling Structured Systems , 1999, Euro-Par.

[147]  Paul Caspi,et al.  Embedded Control: From Asynchrony to Synchrony and Back , 2001, EMSOFT.

[148]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[149]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[150]  Paul Feautrier,et al.  Adjusting a Program Transformation for Legality , 2005, Parallel Process. Lett..

[151]  Erwin A. de Kock,et al.  COSY communication IP's , 2000, Proceedings 37th Design Automation Conference.

[152]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[153]  François Charot,et al.  SALTO : System for Assembly-Language Transformation and Optimization , 1996 .

[154]  Paul G. Sorenson,et al.  The Theory And Practice of Compiler Writing , 1985 .

[155]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[156]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[157]  David A. Padua,et al.  Techniques for the translation of MATLAB programs into Fortran 90 , 1999, TOPL.

[158]  Albert Cohen,et al.  Instance-wise reaching definition analysis for recursive programs using context-free transductions , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[159]  Williams Ludwell HarrisonIII The interprocedural analysis and automatic parallelization of Scheme programs , 1989 .

[160]  Jean-Francois Collard,et al.  Reasoning About Program Transformations , 2002, Springer New York.

[161]  Jean-Francois Collard,et al.  Automatic parallelization ofwhile-loops using speculative execution , 1995, International Journal of Parallel Programming.

[162]  Marc Pouzet,et al.  Synchronous Kahn networks , 1996, ICFP '96.

[163]  Jorge E. Mezei,et al.  On Relations Defined by Generalized Finite Automata , 1965, IBM J. Res. Dev..

[164]  Marc Pouzet,et al.  Towards a higher-order synchronous data-flow language , 2004, EMSOFT '04.

[165]  Martin Griebl,et al.  Data Flow Analysis of Recursive Structures , 1996 .

[166]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[167]  Josep Torrellas,et al.  Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[168]  Keshav Pingali,et al.  A singular loop transformation framework based on non-singular matrices , 1992, International Journal of Parallel Programming.

[169]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[170]  Simon L. Peyton Jones,et al.  Composable memory transactions , 2005, CACM.

[171]  Erwin A. de Kock,et al.  YAPI: application modeling for signal processing systems , 2000, Proceedings 37th Design Automation Conference.

[172]  Martin Odersky,et al.  Domain-Specific Program Generation , 2004, Lecture Notes in Computer Science.

[173]  Albert Cohen,et al.  DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time , 2003, SIGMETRICS '03.

[174]  Paul Feautrier,et al.  Application-domain-driven system design for pervasive video processing , 2003 .

[175]  Michael Wolfe,et al.  Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.

[176]  Karine Heydemann,et al.  UFS: a global trade‐off strategy for loop unrolling for VLIW architectures , 2006, Concurr. Comput. Pract. Exp..

[177]  Roberto Bagnara,et al.  Precise widening operators for convex polyhedra , 2003, Sci. Comput. Program..

[178]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[179]  Gérard Berry,et al.  The foundations of Esterel , 2000, Proof, Language, and Interaction.

[180]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[181]  J. Berstel,et al.  Context-free languages , 1993, SIGA.

[182]  Paul Feautrier,et al.  Automatic Parallelization of Fortran Programs in the Presence of Procedure Calls , 1986, ESOP.

[183]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[184]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.