SPIRAL: Code Generation for DSP Transforms
暂无分享,去创建一个
Franz Franchetti | José M. F. Moura | David A. Padua | Manuela M. Veloso | Kang Chen | Markus Püschel | Jeremy R. Johnson | Yevgen Voronenko | Robert W. Johnson | Jianxin Xiong | Nicholas Rizzolo | Bryan Singer | Aca Gacic | M. Veloso | Markus Püschel | Jeremy R. Johnson | F. Franchetti | D. Padua | Jianxin Xiong | Nicholas Rizzolo | Y. Voronenko | Robert W. Johnson | Kang Chen | Bryan Singer | A. Gacic | Aca Gacic
[1] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[2] Y. Meyer,et al. Wavelets and Filter Banks , 1991 .
[3] Kang Chen,et al. A self-adapting distributed memory package for fast signal transforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[4] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[5] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.
[6] Franz Franchetti,et al. Short vector code generation for the discrete Fourier transform , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[7] Yevgen Voronenko,et al. Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic , 2004 .
[8] Ephraim Feig,et al. New scaled DCT algorithms for fused multiply/add architectures , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[9] Donald E. Knuth,et al. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .
[10] K. Steiglitz,et al. Some complexity issues in digital signal processing , 1984 .
[11] David H. Bailey. Unfavorable strides in cache memory systems , 1992 .
[12] Kang Chen,et al. A prototypical self-optimizing package for parallel implementation of fast signal transforms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[13] L. Torgo. Inductive learning of tree-based regression models , 1999 .
[14] David A. Padua,et al. On the Automatic Parallelization of the Perfect Benchmarks , 1998, IEEE Trans. Parallel Distributed Syst..
[15] Alexander Graham,et al. Kronecker Products and Matrix Calculus: With Applications , 1981 .
[16] Henry Hoffmann,et al. Parallel VSIPL++: An Open Standard Software Library for High-Performance Parallel Signal Processing , 2005, Proceedings of the IEEE.
[17] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[18] Markus Püschel,et al. Cooley-Tukey FFT like algorithms for the DCT , 2003, ICASSP.
[19] Viktor K. Prasanna,et al. Dynamic data layouts for cache-conscious factorization of DFT , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[20] Manuela M. Veloso,et al. Automating the modeling and optimization of the performance of signal transforms , 2002, IEEE Trans. Signal Process..
[21] James C. Hoe,et al. Automatic cost minimization for multiplierless implementations of discrete signal transforms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[22] David A. Padua,et al. HiLO: High Level Optimization of FFTs , 2004, LCPC.
[23] Scott A. Mahlke,et al. Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..
[24] R. W. Johnson,et al. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .
[25] Sebastian Egner,et al. Zur algorithmischen Zerlegungstheorie linearer Transformationen mit Symmetrie , 1997 .
[26] Robert Bregovic,et al. Multirate Systems and Filter Banks , 2002 .
[27] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[28] Vivek Sarkar,et al. A comparative study of static and profile-based heuristics for inlining , 2000, Dynamo.
[29] Donald E. Knuth. The art of computer programming: fundamental algorithms , 1969 .
[30] Markus Püschel,et al. Automatic generation of implementations for DSP transforms on fused multiply-add architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[31] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[32] Pawel Hitczenko,et al. Distribution of a class of divide and conquer recurrences arising from the computation of the Walsh-Hadamard transform , 2006, Theor. Comput. Sci..
[33] Manuela M. Veloso,et al. Learning to Construct Fast Signal Processing Implementations , 2002, J. Mach. Learn. Res..
[34] Peter Sestoft,et al. Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.
[35] Andrew G. Dempster,et al. Extended results for minimum-adder constant integer multipliers , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).
[36] Christopher W. Fraser,et al. Engineering a simple, efficient code-generator generator , 1992, LOPL.
[37] José M. F. Moura,et al. Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms , 2004, Int. J. High Perform. Comput. Appl..
[38] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[39] David A. Padua,et al. Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.
[40] S. Winograd. Arithmetic complexity of computations , 1980 .
[41] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .
[42] Funda Ergün. Testing multivariate linear functions: overcoming the generator bottleneck , 1995, STOC '95.
[43] Markus Püschel,et al. In search of the optimal Walsh-Hadamard transform , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[44] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[45] Franz Franchetti. Performance Portable Short Vector Transforms , 2003 .
[46] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[47] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[48] Zhaofang Wen,et al. Automatic Algorithm Recognition and Replacement: A New Approach to Program Optimization , 2000 .
[49] Dragan Mirkovic. Automatic Performance Tuning in the UHFFT Library , 2001, International Conference on Computational Science.
[50] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[51] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[52] B. Singer,et al. Stochastic Search for Signal Processing Algorithm Optimization , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[53] William H. Press,et al. Numerical recipes in C++: the art of scientific computing, 2nd Edition (C++ ed., print. is corrected to software version 2.10) , 1994 .
[54] Nachum Dershowitz,et al. Chapter 9 – Rewriting , 2001 .
[55] I. Daubechies,et al. Factoring wavelet transforms into lifting steps , 1998 .
[56] Jeremy R. Johnson,et al. Automatic derivation and implementation of fast convolution algorithms , 2004, J. Symb. Comput..
[57] Ephraim Feig,et al. Implementation of Efficient FFT Algorithms on Fused Multiply- Add Architectures , 1993, IEEE Trans. Signal Process..
[58] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.
[59] Manuela M. Veloso,et al. Learning to Generate Fast Signal Processing Implementations , 2001, ICML.
[60] Kang Su Gatlin,et al. Architecture-Cognizant Divide and Conquer Algorithms , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[61] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[62] Franz Franchetti,et al. Efficient Utilization of SIMD Extensions , 2005, Proceedings of the IEEE.
[63] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[64] Larry Carter,et al. Faster FFTs via architecture-cognizance , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[65] W. ABU-SUFAH,et al. Automatic program transformations for virtual memory computers * , 1899, 1979 International Workshop on Managing Requirements Knowledge (MARK).
[66] Franz Franchetti,et al. A SIMD vectorizing compiler for digital signal processing algorithms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[67] W. Press,et al. Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .
[68] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[69] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[70] Paul Feautrier,et al. On the Equivalence of Two Systems of Affine Recurrence Equations (Research Note) , 2002, Euro-Par.
[71] David H. Bailey. Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015) , 1995, Sci. Program..
[72] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[73] György E. Révész. Introduction to formal languages , 1983 .
[74] James C. Hoe,et al. Custom-optimized multiplierless implementations of DSP algorithms , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..
[75] David E. Bernholdt,et al. A performance optimization framework for compilation of tensor contraction expressions into parallel , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[76] C. Lu. Implementation of 'multiply-add' FFT algorithms for complex and real data sequences , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.
[77] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[78] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[79] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[80] Jeremy Johnson,et al. Design, optimization, and implementation of a universal FFT processor , 2000, Proceedings of 13th Annual IEEE International ASIC/SOC Conference (Cat. No.00TH8541).
[81] R. Tolimieri,et al. Algorithms for Discrete Fourier Transform and Convolution , 1989 .
[82] José M. F. Moura,et al. Automatic implementation and platform adaptation of discrete filtering and wavelet algorithms , 2004 .
[83] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .