Operator Language: A Program Generation Framework for Fast Kernels

We present the Operator Language (OL), a framework to automatically generate fast numerical kernels. OL provides the structure to extend the program generation system Spiral beyond the transform domain. Using OL, we show how to automatically generate library functionality for the fast Fourier transform and multiple non-transform kernels, including matrix-matrix multiplication, synthetic aperture radar (SAR), circular convolution, sorting networks, and Viterbi decoding. The control flow of the kernels is data-independent, which allows us to cast their algorithms as operator expressions. Using rewriting systems, a structural architecture model and empirical search, we automatically generate very fast C implementations for state-of-the-art multicore CPUs that rival hand-tuned implementations.

[1]  João Saraiva,et al.  Generative and Transformational Techniques in Software Engineering II , 2007, Lecture Notes in Computer Science.

[2]  Wayne H. Wolf High-Performance Embedded Computing , 2007 .

[3]  Alan Robinson,et al.  The Inverse Method , 2001, Handbook of Automated Reasoning.

[4]  Rodney W. Johnson,et al.  Multilinear algebra and parallel programming , 1990, Supercomputing '90.

[5]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[6]  Franz Franchetti,et al.  Computer Generation of Efficient Software Viterbi Decoders , 2010, HiPEAC.

[7]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[8]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[9]  Markus Püschel,et al.  Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.

[10]  Jon Louis Bentley,et al.  Programming pearls: little languages , 1986, CACM.

[11]  Franz Franchetti,et al.  Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[12]  Krzysztof Czarnecki,et al.  DSL Implementation in MetaOCaml, Template Haskell, and C++ , 2003, Domain-Specific Program Generation.

[13]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[14]  Ulf Nilsson,et al.  Logic, programming and Prolog , 1990 .

[15]  Franz Franchetti,et al.  How to Write Fast Numerical Code: A Small Introduction , 2007, GTTSE.

[16]  Krzysztof Czarnecki,et al.  Generative programming - methods, tools and applications , 2000 .

[17]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[18]  I. H. Öğüş,et al.  NATO ASI Series , 1997 .

[19]  Franz Franchetti,et al.  High-performance synthetic aperture radar image formation on commodity multicore architectures , 2009, Defense + Commercial Sensing.

[20]  David E. Bernholdt,et al.  Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.

[21]  Franz Franchetti,et al.  Domain-specific library generation for parallel software and hardware platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[22]  Nachum Dershowitz,et al.  Chapter 9 – Rewriting , 2001 .

[23]  Yevgen Voronenko,et al.  Library generation for linear transforms , 2008 .

[24]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[25]  W. Taha,et al.  Plenary talk III Domain-specific languages , 2008, 2008 International Conference on Computer Engineering & Systems.

[26]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[27]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[28]  daniel Scott. Smith Mechanizing the development of software , 1991 .

[29]  Michel Daydé,et al.  High Performance Computing for Computational Science - VECPAR 2006, 7th International Conference, Rio de Janeiro, Brazil, June 10-13, 2006, Revised Selected and Invited Papers , 2007, VECPAR.

[30]  Martin Odersky,et al.  Domain-Specific Program Generation , 2004, Lecture Notes in Computer Science.

[31]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[32]  Peter G. Harrison,et al.  Functional Programming , 1988 .

[33]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[34]  Robert A. van de Geijn,et al.  The science of deriving dense linear algebra algorithms , 2005, TOMS.

[35]  Roberto Erick Lopez-Herrejon,et al.  Generating product-lines of product-families , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[36]  K. J. Gough Little language processing, an alternative to courses on compiler construction , 1981, SGCS.

[37]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[38]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[39]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[40]  Olaf Chitil,et al.  Functional Programming , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[41]  Franz Franchetti,et al.  A Rewriting System for the Vectorization of Signal Transforms , 2006, VECPAR.

[42]  Paul Hudak,et al.  Domain Specific Languages , 1998 .

[43]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[44]  M. Puschel,et al.  FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[45]  Don S. Batory,et al.  Achieving extensibility through product-lines and domain-specific languages: a case study , 2000, TSEM.