Computer Generation of General Size Linear Transform Libraries

The development of high-performance libraries has become extraordinarily difficult due to multiple processor cores, vector instruction sets, and deep memory hierarchies. Often, the library has to be reimplemented and reoptimized, when a new platform is released. In this paper we show how to automatically generate general input-size libraries for the domain of linear transforms. The input to our generator is a formal specification of the transform and the recursive algorithms the library should use; the output is a library that supports general input size, is vectorized and multithreaded, provides an adaptation mechanism for the memory hierarchy, and has excellent performance, comparable to or better than the best human-written libraries. Further, we show that our library generator enables various customizations; one example is the generation of Java libraries.

[1]  Thomas Johnsson,et al.  Lambda Lifting: Treansforming Programs to Recursive Equations , 1985, FPCA.

[2]  Neal Glew Object Closure Conversion , 1999, Electron. Notes Theor. Comput. Sci..

[3]  Franz Franchetti,et al.  A Rewriting System for the Vectorization of Signal Transforms , 2006, VECPAR.

[4]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[5]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[6]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[7]  M. Puschel,et al.  FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[8]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[9]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[10]  Yevgen Voronenko,et al.  Library generation for linear transforms , 2008 .

[11]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[12]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[13]  Alan Bundy,et al.  Constructing Induction Rules for Deductive Synthesis Proofs , 2006, CLASE.

[14]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..