Auto-tuning a Matrix Routine for High Performance
暂无分享,去创建一个
[1] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[2] Brian T. Smith,et al. Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.
[3] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[4] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[5] Kei Hiraki,et al. The performance of GRAPE-DR for dense matrix operations , 2011, ICCS.
[6] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.
[7] J. Demmel,et al. Sun Microsystems , 1996 .
[8] Chun Chen,et al. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.
[9] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[10] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[11] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[12] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[13] Rune Erlend Jensen,et al. Techniques and Tools for Optimizing Codes on Modern Architectures: : A Low-Level Approach , 2009 .
[14] B. S. Garbow,et al. Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.
[15] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[16] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[17] JAMES DEMMEL,et al. LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.
[18] Gang Ren,et al. Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization , 2005, LCPC.
[19] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[20] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[21] Naohito Nakasato,et al. A fast GEMM implementation on the cypress GPU , 2011, PERV.
[22] Anne C. Elster,et al. Basic Matrix Subprograms for Distributed Memory Systems , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..
[23] B. S. Garbow,et al. Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.