A semantics-based approach to optimizing unstructured mesh abstractions

Computational scientists are frequently confronted with a choice: implement algorithms using high-level abstractions, such as matrices and mesh entities, for greater programming productivity or code them using low-level language constructs for greater execution efficiency. We have observed that the cost of implementing a representative unstructured mesh code with high-level abstractions is poor computational intensity—the ratio of floating point operations to memory accesses. Related scientific applications frequently produce little “science per cycle” because their abstractions both introduce additional overhead and hinder compiler analysis and subsequent optimization. Our work exploits the semantics of abstractions, as employed in unstructured mesh codes, to overcome these limitations and to guide a series of manual, domain-specific optimizations that significantly improve computational intensity. We propose a framework for the automation of such high-level optimizations within the ROSE source-to-source compiler infrastructure. The specification of optimizations is left to domain experts and library writers who best understand the semantics of their applications and libraries and who are thus best poised to describe their optimization. Our source-to-source approach translates different constructs (e.g., C code written in a procedural style or C++ code written in an object-oriented style) to a procedural form in order to simplify the specification of optimizations. This is accomplished through raising operators, which are specified by a domain expert and are used to project a concrete application from an implementation space to an abstraction space, where optimizations are applied. The transformed code in the abstraction space is then reified as a concrete implementation via lowering operators, which are automatically inferred by inverting the raising operators. Applying optimizations within the abstraction space, rather than the implementation space, leads to greater optimization portability. We use this framework to automate two high-level optimizations. The first uses an inspector/executor approach to avoid costly and redundant traversals of a static mesh by memoizing the relatively few references required to perform the mathematical computations. During the executor phase, the stored entities are accessed directly without resort to the indirection inherent in the original traversal. The second optimization lowers an object-oriented mesh framework, which uses C++ objects to access the mesh and iterate over mesh entities, to a low-level implementation, which uses integer-based access and iteration.

[1]  Eelco Visser,et al.  Fusing a Transformation Language with an Open Compiler , 2008, Electron. Notes Theor. Comput. Sci..

[2]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[3]  Bjarne Stroustrup,et al.  Concepts: linguistic support for generic programming in C++ , 2006, OOPSLA '06.

[4]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[5]  Magne Haveraaen,et al.  Domain-Specific Optimisation with User-Defined Rules in CodeBoost , 2003, Electron. Notes Theor. Comput. Sci..

[6]  Keshav Pingali,et al.  High-level semantic optimization of numerical codes , 1999, ICS '99.

[7]  Patrick J. Moran Field Model: An Object-Oriented Data Model for Fields , 2001 .

[8]  Michael J. Vilot,et al.  Standard template library , 1996 .

[9]  Michael J. Quinn,et al.  Preliminary results from a parallel MATLAB compiler , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[10]  Sibylle Schupp,et al.  STLlint: lifting static checking from languages to libraries , 2006, Softw. Pract. Exp..

[11]  William Gropp,et al.  Performance Modeling and Tuning of an Unstructured Mesh CFD Application , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[12]  Paul D. Hovland,et al.  Representation-independent program analysis , 2005, PASTE '05.

[13]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[14]  Ken Kennedy,et al.  Telescoping languages: a compiler strategy for implementation of high-level domain-specific programming systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[15]  Keshav Pingali,et al.  Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.

[16]  M. Schulz,et al.  Identifying and Exploiting Spatial Regularity in Data Memory References , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[17]  Jeremy G. Siek,et al.  The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra , 1998, ISCOPE.

[18]  Magne Haveraaen,et al.  Design of the CodeBoost transformation system for domain-specific optimisation of C++ programs , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[19]  Calvin Lin,et al.  Optimizing the Use of High Performance Software Libraries , 2000, LCPC.

[20]  Qing Yi,et al.  Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics , 2004, LCPC.

[21]  Calvin Lin,et al.  Incorporating domain-specific information into the compilation process , 2003 .

[22]  Markus Schordan,et al.  Treating a user-defined parallel library as a domain-specific language , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Shigeru Chiba,et al.  A metaobject protocol for C++ , 1995, OOPSLA.

[24]  Sibylle Schupp,et al.  Design patterns for library optimization , 2003, Sci. Program..

[25]  K. Kennedy,et al.  A SOURCE-LEVEL MATLAB TRANSFORMER FOR DSP APPLICATIONS , 2004 .

[26]  Sibylle Schupp,et al.  Library transformations , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[27]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[28]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[29]  Jaakko Järvi,et al.  Algorithm specialization in generic programming: challenges of constrained generics in C++ , 2006, PLDI '06.

[30]  Martin Odersky,et al.  Making the future safe for the past: adding genericity to the Java programming language , 1998, OOPSLA '98.

[31]  Keshav Pingali,et al.  A case for source-level transformations in MATLAB , 1999, DSL '99.

[32]  Eelco Visser,et al.  Composing Source-to-Source Data-Flow Transformations with Rewriting Strategies and Dependent Dynamic Rewrite Rules , 2005, CC.

[33]  Jeremy G. Siek,et al.  Essential language support for generic programming , 2005, PLDI '05.

[34]  David E. Keyes,et al.  Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .

[35]  James Gosling,et al.  The Java Language Specification, 3rd Edition , 2005 .

[36]  Sibylle Schupp,et al.  User-Extensible Simplification - Type-Based Optimizer Generators , 2001, CC.

[37]  D. Quinlan,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999, Parallel Process. Lett..

[38]  Jaakko Järvi,et al.  A comparative study of language support for generic programming , 2003, OOPSLA '03.

[39]  Dennis Gannon,et al.  Sage++: An Object-Oriented Toolkit and Class Library for Building Fortran and C++ Restructuring Tool , 1994 .

[40]  Michael R. Clarkson,et al.  Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[41]  David A. Padua,et al.  FALCON: A MATLAB Interactive Restructuring Compiler , 1995, LCPC.

[42]  Ken Kennedy,et al.  Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.

[43]  Bjarne Stroustrup,et al.  Specifying C++ concepts , 2006, POPL '06.

[44]  Jeffrey S. Vetter,et al.  An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[45]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[46]  Markus Schordan,et al.  Parallel object‐oriented framework optimization , 2004, Concurr. Comput. Pract. Exp..

[47]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[48]  David A. Padua,et al.  MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.

[49]  Philip Wadler,et al.  How to make ad-hoc polymorphism less ad hoc , 1989, POPL '89.

[50]  J. Michael Owen An open-source project for modeling hydrodynamics in astrophysical systems , 2001, Comput. Sci. Eng..

[51]  Larry Carter,et al.  Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[52]  Bertrand Meyer,et al.  Eiffel: The Language , 1991 .

[53]  Eelco Visser,et al.  Program Transformation with Stratego/XT: Rules, Strategies, Tools, and Systems in Stratego/XT 0.9 , 2003, Domain-Specific Program Generation.

[54]  Alexander A. Stepanov,et al.  Algorithm‐oriented generic libraries , 1994, Softw. Pract. Exp..

[55]  Markus Schordan,et al.  Classification and Utilization of Abstractions for Optimization , 2004, ISoLA.

[56]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[57]  Calvin Lin,et al.  Broadway: A Compiler for Exploiting the Domain-Specific Semantics of Software Libraries , 2005, Proceedings of the IEEE.

[58]  Chau-Wen Tseng,et al.  A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.

[59]  David A. Padua,et al.  Techniques for the translation of MATLAB programs into Fortran 90 , 1999, TOPL.

[60]  Mark S. Shephard,et al.  An Object-Oriented Framework for Reliable Numerical Simulations , 1999, Engineering with Computers.

[61]  Markus Schordan,et al.  A Source-to-Source Architecture for User-Defined Optimizations , 2003, JMLC.

[62]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[63]  Sally A. McKee,et al.  Improving the computational intensity of unstructured mesh applications , 2005, ICS '05.

[64]  Scott R. Kohn,et al.  Managing application complexity in the SAMRAI object‐oriented framework , 2002, Concurr. Comput. Pract. Exp..

[65]  John M. Mellor-Crummey,et al.  Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library , 2002, ICS '02.

[66]  Cheryl McCosh,et al.  Domain-Specific Type Inference for Library Generation in a Telescoping Compiler , 2004 .

[67]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.

[68]  I. Dhillon Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem , 1998 .

[69]  Craig Schaffert,et al.  Abstraction mechanisms in CLU , 1977, Commun. ACM.

[70]  Andrew Kennedy,et al.  Design and implementation of generics for the .NET Common language runtime , 2001, PLDI '01.

[71]  Bronis R. de Supinski,et al.  Semantic-Driven Parallelization of Loops Operating on User-Defined Containers , 2003, LCPC.

[72]  Larry Carter,et al.  Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.

[73]  Calvin Lin,et al.  An annotation language for optimizing software libraries , 1999, DSL '99.

[74]  J. Shewchuk,et al.  Delaunay refinement mesh generation , 1997 .

[75]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[76]  Matthew H. Austern Generic programming and the STL - using and extending the C++ standard template library , 1999, Addison-Wesley professional computing series.

[77]  Ken Kennedy,et al.  Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..

[78]  Magne Haveraaen,et al.  An algebraic programming style for numerical software and its optimization , 2000, Sci. Program..

[79]  D Quinlan,et al.  ROSETTA: the compile-time recognition of object-oriented library abstractions and their use within user applications , 2001 .

[80]  Douglas Gregor,et al.  STLlint: lifting static checking from languages to libraries , 2006 .

[81]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[82]  Kei Davis,et al.  Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures , 1998, ISCOPE.

[83]  Terence J. Harmer,et al.  The TAMPR Program Transformation System: Simplifying the Development of Numerical Software , 1997, SciTools.

[84]  Shigeru Chiba,et al.  OpenJava: A Class-Based Macro System for Java , 1999, Reflection and Software Engineering.

[85]  Richard W. Vuduc,et al.  Annotating user-defined abstractions for optimization , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[86]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).