Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel
暂无分享,去创建一个
Robert J. Harrison | P. Sadayappan | Thomas Henretty | Kevin Stock | Iyyappa Murugandi | P. Sadayappan | R. Harrison | Thomas Henretty | Kevin Stock | Iyyappa Murugandi
[1] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[2] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[3] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[4] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[5] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[6] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[7] Sriram Krishnamoorthy,et al. Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.
[8] Rainer Leupers,et al. A SIMD optimization framework for retargetable compilers , 2009, TACO.
[9] Robert J. Harrison,et al. Multiresolution Quantum Chemistry in Multiwavelet Bases , 2003, International Conference on Computational Science.
[10] Robert J. Harrison,et al. Singular operators in multiwavelet bases , 2004, IBM J. Res. Dev..
[11] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[12] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[13] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[14] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[15] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[16] G. Beylkin,et al. Multiresolution quantum chemistry in multiwavelet bases: Analytic derivatives for Hartree-Fock and density functional theory. , 2004, The Journal of chemical physics.
[17] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[18] J. Ramanujam,et al. Parameterized tiling revisited , 2010, CGO '10.
[19] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.
[20] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[21] Gregory Beylkin,et al. Multiresolution quantum chemistry: basic theory and initial applications. , 2004, The Journal of chemical physics.
[22] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[23] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.