Automatic performance tuning of sparse matrix kernels
暂无分享,去创建一个
[1] A. Kolmogoroff. Confidence Limits for an Unknown Distribution Function , 1941 .
[2] Z. Birnbaum. Numerical Tabulation of the Distribution of Kolmogorov's Statistic for Finite Sample Size , 1952 .
[3] G. E. Noether. Note on the kolmogorov statistic in the discrete case , 1963 .
[4] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[5] J. W. Walker,et al. Direct solutions of sparse network equations by optimally ordered triangular factorization , 1967 .
[6] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[7] Donald E. Knuth,et al. An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..
[8] David Siegmund,et al. Great expectations: The theory of optimal stopping , 1971 .
[9] D. Rose. A GRAPH-THEORETIC STUDY OF THE NUMERICAL SOLUTION OF SPARSE POSITIVE DEFINITE SYSTEMS OF LINEAR EQUATIONS , 1972 .
[10] Udo W. Pooch,et al. A Survey of Indexing Techniques for Sparse Matrices , 1973, CSUR.
[11] A. George. Nested Dissection of a Regular Finite Element Mesh , 1973 .
[12] John R. Rice,et al. The Algorithm Selection Problem , 1976, Adv. Comput..
[13] P. Bickel,et al. Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .
[14] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.
[15] Alan George,et al. The Design of a User Interface for a Sparse Matrix Package , 1979, TOMS.
[16] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[17] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[18] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[19] Thomas R. Gross,et al. Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.
[20] Anne Lohrli. Chapman and Hall , 1985 .
[21] John R. Rice,et al. Solving elliptic problems using ELLPACK , 1985, Springer series in computational mathematics.
[22] Katherine Yelick,et al. Performance Optimizations and Bounds for Sparse Symmetric Matrix-Multiple Vector Multiply , 1985 .
[23] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .
[24] Henry Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.
[25] J. Rice. Mathematical Statistics and Data Analysis , 1988 .
[26] Thomas F. Coleman,et al. A parallel triangular solver for distributed-memory multiprocessor , 1988 .
[27] Ken Kennedy,et al. Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..
[28] Thomas S. Ferguson,et al. Who Solved the Secretary Problem , 1989 .
[29] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[30] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[31] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[32] Joel H. Saltz,et al. Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.
[33] Timothy A. Davis,et al. An Unsymmetric-pattern Multifrontal Method for Sparse Lu Factorization , 1993 .
[34] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..
[35] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[36] Rafael Hector Saavedra-Barrera,et al. CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .
[37] Richard Kenner,et al. Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.
[38] Anoop Gupta,et al. Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..
[39] Michael Lucks,et al. Automated selection of mathematical software , 1992, TOMS.
[40] John R. Gilbert,et al. Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..
[41] David H. Bailey,et al. NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.
[42] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[43] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[44] Fernando L. Alvarado,et al. Optimal Parallel Solution of Sparse Triangular Systems , 1993, SIAM J. Sci. Comput..
[45] Alexander A. Stepanov,et al. Algorithm‐oriented generic libraries , 1994, Softw. Pract. Exp..
[46] Mark T. Jones,et al. Scalable Iterative Solution of Sparse Linear Systems , 1994, Parallel Comput..
[47] S. CohnData. Assessing the Eeects of Data Selection with Dao's Physical-space Statistical Analysis System , 1994 .
[48] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[49] Susan T. Dumais,et al. Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..
[50] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[51] Eunice E. Santos. Solving triangular linear systems in parallel using substitution , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[52] Edward Rothberg,et al. Alternatives for Solving Sparse Triangular Systems on Distributed-Memory Multiprocessors , 1995, Parallel Comput..
[53] Weichung Wang,et al. Adaptive use of iterative methods in interior point methods for linear programming , 1995 .
[54] Vipin Kumar,et al. Parallel Algorithms for Forward Elimination and Backward Substitution in Direct Solution of Sparse L , 1995 .
[55] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[56] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[57] Eric A. Brewer,et al. High-level optimization via automated statistical modeling , 1995, PPOPP '95.
[58] Aart J. C. Bik,et al. Advanced Compiler Optimizations for Sparse Computations , 1995, J. Parallel Distributed Comput..
[59] Rajiv Gupta,et al. Adaptive loop transformations for scientific programs , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[60] Preston Briggs. Sparse matrix multiplication , 1996, SIGP.
[61] William Gropp,et al. MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.
[62] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[63] Josep-Lluís Larriba-Pey,et al. Block algorithms for sparse matrix computations on high performance workstations , 1996, ICS '96.
[64] Craig C. Douglas,et al. Caching in with Multigrid Algorithms: Problems in Two Dimensions , 1996, Parallel Algorithms Appl..
[65] Bowen Alpern,et al. Hierarchical Tiling: A Methodology for High Performance , 1996 .
[66] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.
[67] Sandra Fillebrown,et al. The MathWorks' MATLAB , 1996 .
[68] Patrick R. Amestoy,et al. An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..
[69] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[70] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[71] S. Bikhchandani,et al. Optimal Search with Learning , 2011 .
[72] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[73] Paul Vinson Stodghill,et al. A Relational Approach to the Automatic Generation of Sequential Sparse matrix Codes , 1997 .
[74] J. Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997 .
[75] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[76] Martin C. Rinard,et al. Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.
[77] M. SIAMJ.. FAST NESTED DISSECTION FOR FINITE ELEMENT MESHES , 1997 .
[78] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[79] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[80] John R. Gilbert,et al. Aspect-Oriented Programming of Sparse Matrix Code , 1997, ISCOPE.
[81] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[82] Vipin Kumar,et al. A high performance two dimensional scalable parallel algorithm for solving sparse triangular systems , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[83] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[84] Michael B. Giles,et al. Renumbering unstructured grids to improve the performance of codes on hierarchical memory machines , 1997 .
[85] Mark Leone,et al. Dynamo: A Staged Compiler Architecture for Dynamic Program Optimization , 1997 .
[86] Florin Dobrian,et al. Object-Oriented Design for Sparse Direct Solvers , 1998, ISCOPE.
[87] Aart J. C. Bik,et al. The automatic generation of sparse primitives , 1998, TOMS.
[88] Jack Dongarra,et al. Developing numerical libraries in Java , 1998 .
[89] Stefan Andersson,et al. RS/6000 Scientific and Technical Computing: POWER3 Introduction and Tuning Guide , 1998 .
[90] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[91] Jeremy G. Siek,et al. A Rational Approach to Portable High Performance: The Basic Linear Algebra Instruction Set (BLAIS) and the Fixed Algorithm Size Template (FAST) Library , 1998, ECOOP Workshops.
[92] Todd L. Veldhuizen,et al. Arrays in Blitz++ , 1998, ISCOPE.
[93] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[94] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[95] James Demmel,et al. The PHiPAC v1.0 Matrix-Multiply Distribution , 1998 .
[96] Brendan J. Frey,et al. Graphical Models for Machine Learning and Digital Communication , 1998 .
[97] Kanad Ghose,et al. Caching-efficient multithreaded fast multiplication of sparse matrices , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[98] Clark D. Thomborson,et al. Data cache parameter measurements , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[99] Edith Cohen,et al. Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..
[100] Dennis Gannon,et al. Active Libraries: Rethinking the roles of compilers and libraries , 1998, ArXiv.
[101] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[102] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[103] S. Cohn,et al. Assessing the Effects of Data Selection with the DAO Physical-Space Statistical Analysis System* , 1998 .
[104] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[105] Charles Consel,et al. Tempo: specializing systems applications and beyond , 1998, CSUR.
[106] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[107] James Demmel,et al. Multigrid equation solvers for large-scale nonlinear finite element simulations , 1999 .
[108] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[109] Dawson R. Engler,et al. C and tcc: a language and compiler for dynamic code generation , 1999, TOPL.
[110] Roman Geus,et al. Towards a fast parallel sparse matrix-vector multiplication , 2000, PARCO.
[111] Cleve Ashcraft,et al. SPOOLES: An Object-Oriented Sparse Matrix Library , 1999, PPSC.
[112] Craig S. K. Clapp,et al. Instruction-level Parallelism in AES Candidates , 1999 .
[113] James Demmel,et al. A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..
[114] Paul van der Mark,et al. Using Iterative Compilation for Managing Software Pipeline-Unrolling Trade-offs , 1999 .
[115] Francisco F. Rivera,et al. Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.
[116] Vipin Kumar,et al. PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver , 1999, PPSC.
[117] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[118] A. Pinar,et al. Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[119] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .
[120] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[121] Keshav Pingali,et al. A case for source-level transformations in MATLAB , 1999, DSL '99.
[122] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[123] Taher H. Haveliwala. Efficient Computation of PageRank , 1999 .
[124] Michael T. Heath,et al. Performance of Parallel Sparse Triangular Solution , 1999 .
[125] Emilio L. Zapata,et al. Automatic analytical modeling for the estimation of cache misses , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[126] Aart J. C. Bik,et al. Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..
[127] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[128] Katherine A. Yelick,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.
[129] E. Im,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.
[130] Jon Kleinberg,et al. Authoritative sources in a hyperlinked environment , 1999, SODA '98.
[131] Kang Su Gatlin,et al. Architecture-Cognizant Divide and Conquer Algorithms , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[132] Michael Voss,et al. ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.
[133] John Worley,et al. AES Finalists on PA-RISC and IA-64: Implementations & Performance , 2000, AES Candidate Conference.
[134] Y. Saad,et al. Iterative solution of linear systems in the 20th century , 2000 .
[135] Lawrence E. Bassham. Efficiency Testing of ANSI C Implementations of Round 2 Candidate Algorithms for the Advanced Encryption Standard , 2000, AES Candidate Conference.
[136] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[137] Naren Ramakrishnan,et al. Note on generalization in experimental algorithmics , 2000, TOMS.
[138] James Demmel,et al. Statistical Modeling of Feedback Data in an Automatic Tuning System , 2000 .
[139] C. F. Jeff Wu,et al. Experiments: Planning, Analysis, and Parameter Design Optimization , 2000 .
[140] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[141] Bryan Weeks,et al. Hardware Performance Simulations of Round 2 Advanced Encryption Standard Algorithms , 2000, AES Candidate Conference.
[142] Michail G. Lagoudakis,et al. Algorithm Selection using Reinforcement Learning , 2000, ICML.
[143] Andy Nisbet,et al. GAPS: Iterative Feedback Directed Parallelisation Using Genetic Algorithms , 2000 .
[144] Jack J. Dongarra,et al. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[145] William Gropp,et al. Latency, bandwidth, and concurrent issue limitations in high-performance CFD. , 2000 .
[146] M. Challacombe. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory , 2000 .
[147] Jeffrey Scott Vitter,et al. Efficient Sorting Using Registers and Caches , 2000, Algorithm Engineering.
[148] Fred G. Gustavson,et al. LAWRA: Linear Algebra with Recursive Algorithms , 2000, PARA.
[149] Michael D. Smith,et al. Overcoming the Challenges to Feedback-Directed Optimization , 2000, Dynamo.
[150] Andrei Z. Broder,et al. Graph structure in the Web , 2000, Comput. Networks.
[151] T. Kisuki,et al. Iterative Compilation in Program Optimization , 2000 .
[152] Manuela M. Veloso,et al. Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.
[153] Bruce Schneier,et al. A Performance Comparison of the Five AES Finalists , 2000, AES Candidate Conference.
[154] Keshav Pingali,et al. A Framework for Sparse Matrix Code Synthesis from High-level Specifications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[155] Keshav Pingali,et al. Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.
[156] Matthew Arnold,et al. Adaptive Optimization in the Jalapeo JVM: The Controller's Analytical Model , 2000 .
[157] James C. Browne,et al. Compositional Development of Performance Models in Poems , 2000, Int. J. High Perform. Comput. Appl..
[158] Richard Weiss,et al. A Comparison of AES Candidates on the Alpha 21264 , 2000, AES Candidate Conference.
[159] Michele Colajanni,et al. PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.
[160] Michael A. Bender,et al. Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.
[161] Naren Ramakrishnan,et al. PYTHIA-II: a knowledge/database system for managing performance data and recommending scientific software , 2000, TOMS.
[162] Markus Mock,et al. DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..
[163] C. Thomborson,et al. MEASURING DATA CACHE AND TLB PARAMETERS UNDER LINUX , 2000 .
[164] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[165] Gerd Heber,et al. Self‐avoiding walks over adaptive unstructured grids , 2000 .
[166] Keshav Pingali,et al. The Bernoulli Generic Matrix Library , 2000 .
[167] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[168] Fumihiko Sano,et al. Performance Evaluation of AES Finalists on the High-End Smart Card , 2000, AES Candidate Conference.
[169] Siddhartha Chatterjee,et al. Cache-Efficient Multigrid Algorithms , 2001, Int. J. High Perform. Comput. Appl..
[170] Kunle Olukotun,et al. High Bandwidth On-Chip Cache Design , 2001, IEEE Trans. Computers.
[171] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[172] Patrick Amestoy,et al. A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..
[173] José M. F. Moura,et al. Fast Automatic Generation of DSP Algorithms , 2001, International Conference on Computational Science.
[174] Joseph L. Hellerstein,et al. Using Control Theory to Achieve Service Level Objectives In Performance Management , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).
[175] James Demmel,et al. Statistical Models for Automatic Performance Tuning , 2001, International Conference on Computational Science.
[176] Larry Carter,et al. A Modal Model of Memory , 2001, International Conference on Computational Science.
[177] Roldan Pozo,et al. NIST sparse BLAS user's guide , 2001 .
[178] James Demmel,et al. Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations , 2001 .
[179] Greg M. Henry,et al. Flexible High-Performance Matrix Multiply via a Self-Modifying Runtime Code , 2001 .
[180] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[181] K. Cooper,et al. Compilation Order Matters , 2001 .
[182] William Kahan,et al. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .
[183] Dr. Andy P. Nisbet,et al. Towards Retargettable Compilers — Feedback Directed Compilation Using Genetic Algorithms ( Work in Progress ) , 2001 .
[184] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[185] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[186] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[187] Victor Eijkhout,et al. Recursive approach in sparse matrix LU factorization , 2001, Sci. Program..
[188] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[189] George V. Meghabghab,et al. Google's web page ranking applied to different topological web graph structures , 2001, J. Assoc. Inf. Sci. Technol..
[190] Fred G. Gustavson,et al. A recursive formulation of Cholesky factorization of a matrix in packed storage , 2001, TOMS.
[191] D. Tafti. GenIDLEST: A Scalable Parallel Computational Tool for Simulating Complex Turbulent Flows , 2001, Fluids Engineering.
[192] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[193] Michael I. Jordan,et al. Link Analysis, Eigenvectors and Stability , 2001, IJCAI.
[194] Dragan Mirkovic,et al. Automatic Performance Tuning in the UHFFT Library , 2001, International Conference on Computational Science.
[195] Laura Carrington,et al. Modeling application performance by convolving machine signatures with application profiles , 2001 .
[196] Nayda G. Santiago,et al. A statistical approach for the analysis of the relation between low-level performance information, the code, and the environment , 2002, Proceedings. International Conference on Parallel Processing Workshop.
[197] James Demmel,et al. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[198] Elizabeth R. Jessup,et al. Toward Memory-Efficient Linear Solvers , 2002, VECPAR.
[199] L. Kish. End of Moore's law: thermal (noise) death of integration in micro and nano electronics , 2002 .
[200] David Parello,et al. On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[201] Jeffrey S. Vetter,et al. Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[202] Paul H. J. Kelly,et al. Delayed Evaluation, Self-optimising Software Components as a Programming Model , 2002, Euro-Par.
[203] David E. Bernholdt,et al. A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[204] Iain S. Duff,et al. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum , 2002, TOMS.
[205] Pedro C. Diniz,et al. A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.
[206] Daniel A. Reed,et al. Markov model prediction of I/O requests for scientific applications , 2002, ICS '02.
[207] J. Demmel,et al. An updated set of basic linear algebra subprograms (BLAS) , 2002, TOMS.
[208] Keith H. Randall,et al. Denali: a goal-directed superoptimizer , 2002, PLDI '02.
[209] Jorge J. Moré,et al. Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .
[210] J. Darcy. Finding a Fast Quicksort Implementation for Java , 2002 .
[211] Katherine Yelick,et al. Automatic Performance Tuning and Analysis of Sparse Triangular Solve , 2002 .
[212] Christoph W. Ueberhuber,et al. Cache Oblivious High Performance Algorithms for Matrix Multiplication , 2002 .
[213] Gerhard Wellein,et al. Fast Sparse Matrix-Vector Multiplication for TeraFlop/s Computers , 2002, VECPAR.
[214] Jeffrey K. Hollingsworth,et al. SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[215] Sivan Toledo,et al. Nested-Dissection Orderings for Sparse LU with Partial Pivoting , 2002, SIAM J. Matrix Anal. Appl..
[216] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[217] Jean-Guillaume Dumas,et al. Finite field linear algebra subroutines , 2002, ISSAC '02.
[218] David A. Padua,et al. MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.
[219] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.
[220] Eli Upfal,et al. Using PageRank to Characterize Web Structure , 2002, COCOON.
[221] Sanjukta Bhowmick,et al. A Combinatorial Scheme for Developing Efficient Composite Solvers , 2002, International Conference on Computational Science.
[222] Masha Sosonkina,et al. Parallel Iterative Methods in Modern Physical Applications , 2002, International Conference on Computational Science.
[223] David E. Bernholdt,et al. Space-time trade-off optimization for a class of electronic structure calculations , 2002, PLDI '02.
[224] A. Rozga,et al. Maternal sensitivity and attachment in atypical groups. , 2002, Advances in child development and behavior.
[225] Dror Irony,et al. Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky , 2002, International Conference on Computational Science.
[226] Jasmine Novak,et al. PageRank Computation and the Structure of the Web: Experiments and Algorithms , 2002 .
[227] John M. Mellor-Crummey,et al. Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library , 2002, ICS '02.
[228] Paul N. Hilfinger,et al. Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[229] Iain S. Duff,et al. Algorithm 818: A reference model implementation of the sparse BLAS in fortran 95 , 2002, TOMS.
[230] Jeffrey Scott Vitter,et al. Efficient sorting using registers and caches , 2000, JEAL.
[231] M. Gilli,et al. Solving finite difference schemes arising in trivariate option pricing , 2002 .
[232] Taher H. Haveliwala. Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..
[233] Gerth Stølting Brodal,et al. Cache oblivious search trees via binary trees of small height , 2001, SODA '02.
[234] C. Leopold. Tight Bounds on Capacity Misses for 3D Stencil Codes , 2002 .
[235] Jeffrey S. Vetter,et al. An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[236] Michael A. Bender,et al. Cache-oblivious priority queue and graph algorithm applications , 2002, STOC '02.
[237] Zizhong Chen,et al. Self-Adapting Software for Numerical Linear Algebra Library Routines on Clusters , 2003, International Conference on Computational Science.
[238] Gene H. Golub,et al. Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.
[239] Li Chen,et al. Parallel Finite Element Analysis Platform for the Earth Simulator: GeoFEM , 2003, International Conference on Computational Science.
[240] David I. August,et al. Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[241] Zizhong Chen,et al. Self-adapting software for numerical linear algebra and LAPACK for clusters , 2003, Parallel Comput..
[242] Sally A. McKee,et al. METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[243] James W. Thomas. Inlining of Mathematical Functions in HP-UX for Itanium ® 2 , 2003, CGO.
[244] Jennifer Widom,et al. Scaling personalized web search , 2003, WWW '03.
[245] Gene H. Golub,et al. Exploiting the Block Structure of the Web for Computing , 2003 .
[246] Vijay Kumar,et al. Efficient galois field arithmetic on SIMD architectures , 2003, SPAA '03.
[247] Victor Eijkhout,et al. Self-Adapting Numerical Software and Automatic Tuning of Heuristics , 2003, International Conference on Computational Science.
[248] John A. Tomlin,et al. A new paradigm for ranking pages on the world wide web , 2003, WWW '03.
[249] Taher H. Haveliwala,et al. The Second Eigenvalue of the Google Matrix , 2003 .
[250] Larry Carter,et al. Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.
[251] Michael Franz,et al. Continuous program optimization: A case study , 2003, TOPL.
[252] J. Shalf,et al. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[253] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[254] Yunheung Paek,et al. Finding effective optimization phase sequences , 2003 .
[255] Jean-Francois Collard,et al. Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor , 2003, CGO.
[256] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[257] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[258] James W. Thomas. Inlining of mathematical functions in HP-UX for Itanium/sup /spl reg// 2 , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[259] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.
[260] Chandra Krintz. Coupling on-line and off-line profile information to improve program performance , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[261] Derek Bruening,et al. An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[262] S. Kirkland. Conditioning properties of the stationary distribution for a Markov chain , 2003 .
[263] Pedro C. Diniz. A Compiler Approach to Performance Prediction Using Empirical-Based Modeling , 2003, International Conference on Computational Science.
[264] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..
[265] Shang-Hua Teng,et al. Recovering Mesh Geometry from a Stiffness Matrix , 2002, Numerical Algorithms.
[266] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[267] Timothy A. Davis,et al. A column approximate minimum degree ordering algorithm , 2000, TOMS.
[268] Keith D. Cooper,et al. Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.
[269] Darren J. Wilkinson,et al. A sparse matrix approach to Bayesian computation in large linear models , 2004, Comput. Stat. Data Anal..
[270] Vivek Sarkar. Optimized Unrolling of Nested Loops , 2004, International Journal of Parallel Programming.
[271] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[272] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[273] A data locality optimizing algorithm , 2004, SIGP.
[274] Robert A. van de Geijn,et al. A Family of High-Performance Matrix Multiplication Algorithms , 2004, PARA.
[275] Taher H. Haveliwala,et al. Adaptive methods for the computation of PageRank , 2004 .
[276] Amy Nicole Langville,et al. A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..
[277] Elizabeth R. Jessup,et al. A Technique for Accelerating the Convergence of Restarted GMRES , 2005, SIAM J. Matrix Anal. Appl..
[278] Bernard Philippe,et al. Numerical Methods in Markov Chain Modeling , 1992, Oper. Res..