Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer's ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world's software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient core and vector parallelism is essential to enabling the use of multi- and many-core processor architectures. However, hardware architectures and vector ISAs are also shifting and diversifying quickly, making it difficult for a single binary to run well on all possible targets. Because of this, retargetability and dynamic compilation are of growing relevance. This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework. This system focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++. ArBB interoperates with other programming models to help meet the demands we hear from customers for a solution with both greater programmer productivity and good performance. This work makes contributions in language features, compiler architecture, code transformations and optimizations. It presents performance data from the current beta release of ArBB and quantitatively shows the impact of some key analyses, enabling transformations and optimizations for a variety of benchmarks that are of interest to our customers.

[1]  David Abrahams,et al.  C++ template metaprogramming , 2005 .

[2]  John Shalf,et al.  SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .

[3]  Jan Bosch,et al.  Design Patterns as Language Constructs , 1998, J. Object Oriented Program..

[4]  Albert Cohen,et al.  Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[5]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[6]  Anwar Ghuloum Ct: channelling NeSL and SISAL in C++ , 2007, CUFP '07.

[7]  Timothy G. Mattson,et al.  A Pattern Language for Parallel Application Programs (Research Note) , 2000, Euro-Par.

[8]  Jonathan Schaeffer,et al.  Using generative design patterns to generate parallel code for a distributed memory environment , 2003, PPoPP '03.

[9]  Marco Danelutto,et al.  Skeleton-based parallel programming: Functional and parallel semantics in a single shot , 2007, Comput. Lang. Syst. Struct..

[10]  Ian Buck GPU Computing: Programming a Massively Parallel Processor , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[11]  Michael McCool,et al.  Structured parallel programming with deterministic patterns , 2010 .

[12]  Bryan Chan,et al.  Shader algebra , 2004, SIGGRAPH 2004.

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  David B. Skillicorn,et al.  Models and languages for parallel computation , 1998, CSUR.

[15]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[16]  Jonathan Schaeffer,et al.  Pattern-based parallel programming , 2002, Proceedings International Conference on Parallel Processing.

[17]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[18]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[19]  Michael D. McCool,et al.  Shader metaprogramming , 2002, HWWS '02.

[20]  Mats Brorsson,et al.  OdinMP/CCp - a portable implementation of OpenMP for C , 2000, Concurr. Pract. Exp..

[21]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[22]  M. McCool Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform , 2006 .

[23]  Jack Herrington,et al.  Code Generation in Action , 2003 .

[24]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[25]  Krzysztof Czarnecki,et al.  Generative programming - methods, tools and applications , 2000 .

[26]  Jesse Fang,et al.  A Flexible Parallel Programming Model for Tera-scale Architectures Table of Contents , 2007 .

[27]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[28]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[29]  Charles N. Fischer Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation , 1990, PLDI 1996.

[30]  Todd L. Veldhuizen C++ Templates as Partial Evaluation , 1999, PEPM.

[31]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[32]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[33]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[34]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[35]  Kenneth R. Gold APL: A Programming Language. , 1970 .

[36]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[37]  Mats Brorsson,et al.  OdinMP/CCp—a portable implementation of OpenMP for C , 2000 .

[38]  Ajit Singh,et al.  Design Patterns for Parallel Programming , 1996, PDPTA.

[39]  Yuan Zhao,et al.  Scalarization on Short Vector Machines , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[40]  Manuel M. T. Chakravarty,et al.  Nepal - Nested Data Parallelism in Haskell , 2001, Euro-Par.

[41]  Michael D. McCool,et al.  Metaprogramming GPUs with Sh , 2004 .

[42]  Peter Lee,et al.  Optimizing ML with run-time code generation , 1996, PLDI '96.

[43]  Jonathan Schaeffer,et al.  From patterns to frameworks to parallel programs , 2002, Parallel Comput..

[44]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[45]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[46]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[47]  Siddhartha Chatterjee,et al.  Compiling nested data-parallel programs for shared-memory multiprocessors , 1993, TOPL.