Program Composition and Optimization: An Introduction
暂无分享,去创建一个
Christoph W. Kessler | David A. Padua | Welf Löwe | Markus Püschel | Markus Püschel | D. Padua | C. Kessler | Welf Löwe
[1] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[2] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[3] M. Puschel,et al. FFT Program Generation for Shared Memory: SMP and Multicore , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[4] Franz Franchetti,et al. A Rewriting System for the Vectorization of Signal Transforms , 2006, VECPAR.
[5] Christoph W. Kessler,et al. Optimized composition of performance‐aware parallel components , 2012, Concurr. Comput. Pract. Exp..
[6] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[7] Christoph W. Kessler,et al. A Framework for Performance-Aware Composition of Explicitly Parallel Components , 2007, PARCO.
[8] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[9] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[10] Franz Franchetti,et al. How to Write Fast Numerical Code: A Small Introduction , 2007, GTTSE.
[11] Jesper Andersson,et al. Composition and Optimization , 2008 .
[12] Michael F. P. O'Boyle,et al. Iterative Compilation , 2002, Embedded Processor Design Challenges.
[13] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[14] David A. Padua,et al. Programming for Locality and Parallelism with Hierarchically Tiled Arrays , 2003, LCPC.
[15] Jesper Andersson,et al. Profile-Guided Composition , 2008, SC@ETAPS.
[16] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[17] Michael F. P. O'Boyle,et al. Portable compiler optimisation across embedded programs and microarchitectures using machine learning , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Franz Franchetti,et al. Formal datapath representation and manipulation for implementing DSP transforms , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[19] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[20] Markus Püschel,et al. Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.
[21] David A. Padua,et al. Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.
[22] Welf Löwe,et al. Foundations for the integration of scheduling techniques into compilers for parallel languages , 2005, Int. J. Comput. Sci. Eng..
[23] Jesper Andersson,et al. Reconfigurable Scientific Applications on GRID Services , 2005, EGC.
[24] David A. Padua,et al. A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[25] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[26] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[27] Keith D. Cooper,et al. ACME: adaptive compilation made efficient , 2005, LCTES '05.
[28] Ed F. Deprettere,et al. Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS , 2002 .