Automatic implementation and platform adaptation of discrete filtering and wavelet algorithms

Moore's law, with the doubling of the transistor count every 18 months, poses serious challenges to high-performance numerical software designers: how to stay close to the maximum achievable performance on ever-changing and ever-faster hardware technologies? Up-to-date numerical libraries are usually maintained by large teams of expert programmers who hand-tune the code to a specific class of computer platforms, sacrificing portability for performance. Every new generation of processors reopens the cycle of implementing, tuning, and debugging. The SPIRAL system addresses this problem by automatically generating and implementing algorithms for DSP numerical kernels and searching for the best solution on the platform of interest. Using search, SPIRAL adapts code to take optimal advantage of the available platform features, such as the architecture of the memory hierarchy and register banks. As a result, SPIRAL generates high-performance implementations for DSP transforms that are competitive with the best hand-coded numerical libraries provided by hardware vendors. In this thesis, we focus on automatic implementation and platform adaptation of filtering and wavelet kernels, which are at the core of many performance-critical DSP applications. We formulate many well-known algorithms for FIR filters and discrete wavelet transforms (DWT) using a concise and flexible symbolic mathematical language and integrate it in the SPIRAL system. This enables automatic generation and search over the comprehensive space of competitive algorithms, often leading to complex solutions that are hardly ever considered by a human programmer. Experimental results show that our automatically generated and tuned code for FIR filters and DWTs is competitive and sometimes even outperforms hand-coded numerical libraries provided by hardware vendors. This implies that the richness and the extent of the automatically generated search space can match human ingenuity in achieving high performance. Our system generates high-quality code for digital filtering and wavelet kernels across most current and compatible future computer platforms and frees software developers from tedious and time-consuming coding at the machine level.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  J. Cooley,et al.  New algorithms for digital convolution , 1977 .

[3]  Martin Vetterli,et al.  Orthogonal time-varying filter banks and wavelet packets , 1994, IEEE Trans. Signal Process..

[4]  David Sepiashvili,et al.  Performance Models and Search Methods for Optimal FFT Implementations , 2006 .

[5]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[6]  C. Burrus,et al.  Optimal wavelet representation of signals and the wavelet sampling theorem , 1994 .

[7]  Shmuel Winograd Some bilinear forms whose multiplicative complexity depends on the field of constants , 2005, Mathematical systems theory.

[8]  Ton Kalker,et al.  On ladder structures and linear phase conditions for bi-orthogonal filter banks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Francisco Tirado,et al.  Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  Martin Vetterli,et al.  Improved Fourier and Hartley transform algorithms: Application to cyclic convolution of real data , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Veyis Nuri,et al.  Generalized symmetric extension for size-limited multirate filter banks , 1994, IEEE Trans. Image Process..

[13]  Vladimir Britanak,et al.  The fast generalized discrete Fourier transforms: A unified approach to the discrete sinusoidal transforms computation , 1999, Signal Process..

[14]  James Demmel,et al.  Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[15]  Franz Franchetti,et al.  A SIMD vectorizing compiler for digital signal processing algorithms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[16]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[17]  Martin Vetterli Running FIR and IIR filtering using multirate filter banks , 1988, IEEE Trans. Acoust. Speech Signal Process..

[18]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[19]  Robert Bregovic,et al.  Multirate Systems and Filter Banks , 2002 .

[20]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[21]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[22]  Jeremy R. Johnson,et al.  Automatic derivation and implementation of fast convolution algorithms , 2004, J. Symb. Comput..

[23]  Martin Vetterli,et al.  Perfect reconstruction FIR filter banks: some properties and factorizations , 1989, IEEE Trans. Acoust. Speech Signal Process..

[24]  I. Daubechies,et al.  Factoring wavelet transforms into lifting steps , 1998 .

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  R. Coifman,et al.  Fast wavelet transforms and numerical algorithms I , 1991 .

[27]  Franz Franchetti,et al.  Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[28]  Z. Mou,et al.  Fast FIR filtering: algorithms and implementations , 1987 .

[29]  Francisco Tirado,et al.  Wavelet Transform for Large Scale Image Processing on Modern Microprocessors , 2002, VECPAR.

[30]  György E. Révész Introduction to formal languages , 1983 .

[31]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[32]  David A. Padua,et al.  HiLO: High Level Optimization of FFTs , 2004, LCPC.

[33]  Gunnar Karlsson,et al.  Extension of finite length signals for sub-band coding , 1989 .

[34]  José M. F. Moura,et al.  The Algebraic Approach to the Discrete Cosine and Sine Transforms and Their Fast Algorithms , 2003, SIAM J. Comput..

[35]  Markus Püschel,et al.  In search of the optimal Walsh-Hadamard transform , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[36]  Wolfgang Dahmen,et al.  Multiscale Wavelet Methods for Partial Differential Equations , 1997 .

[37]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[38]  C. Burrus,et al.  Fast one-dimensional digital convolution by multidimensional techniques , 1974 .

[39]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[40]  Martin Vetterli,et al.  Basefield transforms with the convolution property , 1994, Proc. IEEE.

[41]  R. W. Johnson,et al.  A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .

[42]  R. Tolimieri,et al.  Algorithms for Discrete Fourier Transform and Convolution , 1989 .

[43]  K Ramchandran,et al.  Best wavelet packet bases in a rate-distortion sense , 1993, IEEE Trans. Image Process..

[44]  Stephen A. Martucci,et al.  Symmetric convolution and the discrete sine and cosine transforms , 1993, IEEE Trans. Signal Process..

[45]  Guoan Bi,et al.  Fast generalized DFT and DHT algorithms , 1998, Signal Process..

[46]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[47]  B. Singer,et al.  Stochastic Search for Signal Processing Algorithm Optimization , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[48]  Gabriel Fernandez,et al.  LIFTPACK: a software package for wavelet transforms using lifting , 1996, Optics & Photonics.

[49]  W. Greub Linear Algebra , 1981 .

[50]  O. K. Ersoy,et al.  Fast computation of real discrete Fourier transform for any number of data points , 1991 .

[51]  I. Daubechies,et al.  Biorthogonal bases of compactly supported wavelets , 1992 .

[52]  C. Sidney Burrus,et al.  Waveform and image compression using the Burrows Wheeler transform and the wavelet transform , 1997, Proceedings of International Conference on Image Processing.

[53]  Pierre Duhamel,et al.  A unified approach to the fast FIR filtering algorithms , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[54]  C. Sidney Burrus,et al.  A new framework for complex wavelet transforms , 2003, IEEE Trans. Signal Process..

[55]  Alexander Graham,et al.  Kronecker Products and Matrix Calculus: With Applications , 1981 .

[56]  Manuela M. Veloso,et al.  Automating the modeling and optimization of the performance of signal transforms , 2002, IEEE Trans. Signal Process..

[57]  José M. F. Moura,et al.  Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..

[58]  Allan O. Steinhardt,et al.  Fast algorithms for digital signal processing , 1986, Proceedings of the IEEE.

[59]  Vivek K. Goyal Transform coding with integer-to-integer transforms , 2000, IEEE Trans. Inf. Theory.

[60]  Gregory Beylkin,et al.  On the Adaptive Numerical Solution of Nonlinear Partial Differential Equations in Wavelet Bases , 1997 .

[61]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[62]  Franz Franchetti,et al.  Architecture independent short vector FFTs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[63]  C. Burrus,et al.  Introduction to Wavelets and Wavelet Transforms: A Primer , 1997 .

[64]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[65]  Dragan Mirkovic,et al.  Automatic Performance Tuning in the UHFFT Library , 2001, International Conference on Computational Science.

[66]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[67]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[68]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[69]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[70]  G. Bachman,et al.  Fourier and Wavelet Analysis , 2002 .

[71]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[72]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[73]  Kevin C. McGill,et al.  Algorithm 735: Wavelet transform algorithms for finite-duration discrete-time signals , 1994, TOMS.

[74]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[75]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[76]  M. Vetterli,et al.  Wavelets, subband coding, and best bases , 1996, Proc. IEEE.

[77]  Ronald N. Bracewell The Hartley transform , 1986 .

[78]  Peter N. Heller,et al.  Theory of regular M-band wavelet bases , 1993, IEEE Trans. Signal Process..

[79]  Vítor Silva,et al.  General method for perfect reconstruction subband processing of finite length signals using linear extensions , 1999, IEEE Trans. Signal Process..

[80]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[81]  Donald E. Knuth The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[82]  Markus Püschel,et al.  Automatic generation of implementations for DSP transforms on fused multiply-add architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[83]  Thomas G. Marshall U-L block-triangular matrix and ladder realizations of subband coders , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[84]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[85]  Georges Bonnerot,et al.  Digital filtering by polyphase network:Application to sample-rate alteration and filter banks , 1976 .

[86]  James E. Fowler QccPack: an open-source software library for quantization, compression, and coding , 2000, Proceedings DCC 2000. Data Compression Conference.

[87]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[88]  I. Daubechies,et al.  Wavelet Transforms That Map Integers to Integers , 1998 .

[89]  William H. Press,et al.  Numerical recipes , 1990 .

[90]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[91]  H. Nussbaumer Fast Fourier transform and convolution algorithms , 1981 .