Statistical Models for Automatic Performance Tuning
暂无分享,去创建一个
[1] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[2] John R. Rice,et al. The Algorithm Selection Problem , 1976, Adv. Comput..
[3] P. Bickel,et al. Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .
[4] Nayda G. Santiago,et al. A statistical approach for the analysis of the relation between low-level performance information, the code, and the environment , 2002, Proceedings. International Conference on Parallel Processing Workshop.
[5] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[6] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[7] Keith H. Randall,et al. Denali: a goal-directed superoptimizer , 2002, PLDI '02.
[8] David I. August,et al. Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[9] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[10] G. Simons. Great Expectations: Theory of Optimal Stopping , 1973 .
[11] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[12] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[13] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[14] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[15] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[16] Brendan J. Frey,et al. Graphical Models for Machine Learning and Digital Communication , 1998 .
[17] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[18] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[19] Kang Su Gatlin,et al. Architecture-Cognizant Divide and Conquer Algorithms , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[20] Michail G. Lagoudakis,et al. Algorithm Selection using Reinforcement Learning , 2000, ICML.
[21] Michael D. Smith,et al. Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.
[22] Henry Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.
[23] Z. Birnbaum. Numerical Tabulation of the Distribution of Kolmogorov's Statistic for Finite Sample Size , 1952 .
[24] Katherine A. Yelick,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.
[25] Paul Vinson Stodghill,et al. A Relational Approach to the Automatic Generation of Sequential Sparse matrix Codes , 1997 .
[26] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[27] Donald E. Knuth,et al. An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..
[28] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[29] Todd L. Veldhuizen,et al. Arrays in Blitz++ , 1998, ISCOPE.
[30] James Demmel,et al. The PHiPAC v1.0 Matrix-Multiply Distribution , 1998 .
[31] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[32] Oege de Moor,et al. Compiling embedded languages , 2000, Journal of Functional Programming.
[33] Jeffrey Scott Vitter,et al. Efficient sorting using registers and caches , 2000, JEAL.
[34] Manuela M. Veloso,et al. Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.
[35] Paul H. J. Kelly,et al. Delayed Evaluation, Self-optimising Software Components as a Programming Model , 2002, Euro-Par.
[36] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[37] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .
[38] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[39] Eric A. Brewer,et al. High-level optimization via automated statistical modeling , 1995, PPOPP '95.
[40] T. Kisuki,et al. Iterative Compilation in Program Optimization , 2000 .
[41] Fred G. Gustavson,et al. LAWRA: Linear Algebra with Recursive Algorithms , 2000, PARA.
[42] Robert A. van de Geijn,et al. A Family of High-Performance Matrix Multiplication Algorithms , 2001, International Conference on Computational Science.
[43] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.
[44] Aart J. C. Bik,et al. Advanced Compiler Optimizations for Sparse Computations , 1995, J. Parallel Distributed Comput..
[45] Jeremy G. Siek,et al. A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library , 1998, ECOOP Workshops.
[46] Richard Kenner,et al. Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.
[47] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[48] Michael Voss,et al. ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.
[49] David E. Bernholdt,et al. A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[50] E. Im,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.
[51] Dennis Gannon,et al. Active Libraries: Rethinking the roles of compilers and libraries , 1998, ArXiv.
[52] William Gropp,et al. MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.
[53] Michael Lucks,et al. Automated selection of mathematical software , 1992, TOMS.
[54] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[55] Larry Carter,et al. A Modal Model of Memory , 2001, International Conference on Computational Science.
[56] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[57] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[58] Larry Carter,et al. Guiding program transformations with modal performance models , 2000 .
[59] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[60] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[61] James Demmel,et al. Statistical Modeling of Feedback Data in an Automatic Tuning System , 2000 .
[62] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[63] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[64] G. E. Noether. Note on the kolmogorov statistic in the discrete case , 1963 .
[65] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[66] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[67] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).