Quantitative performance modeling of scientific computations and creating locality in numerical algorithms
暂无分享,去创建一个
[1] Hans Riesel,et al. A note on large linear systems , 1956 .
[2] Jon Louis Bentley,et al. Writing efficient programs , 1982 .
[3] KremerUlrich,et al. A static performance estimator to guide data partitioning decisions , 1991 .
[4] Wilbur H. Highleyman. Performance Analysis of Transaction Processing Systems , 1989, SIGMETRICS Perform. Evaluation Rev..
[5] G. Meurant. The block preconditioned conjugate gradient method on vector computers , 1984 .
[6] Cleve B. Moler,et al. Matrix computations with Fortran and paging , 1972, CACM.
[7] W. Daniel Hillis,et al. The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.
[8] Arno Formella,et al. Isolating the Reasons for the Performance of Parallel Machines on Numerical Programs , 1994, Automatic Parallelization.
[9] V. Rokhlin. Rapid solution of integral equations of classical potential theory , 1985 .
[10] Thomas J. LeBlanc,et al. Parallel performance prediction using lost cycles analysis , 1994, Proceedings of Supercomputing '94.
[11] John R. Gilbert,et al. Modeling Data-Parallel Programs with the Alignment-Distribution Graph , 1994 .
[12] Gary L. Miller,et al. A unified geometric approach to graph separators , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.
[13] N. Brenner. Fast Fourier transform of externally stored data , 1969 .
[14] David Chaiken,et al. Mechanisms and interfaces for software-extended coherent shared memory , 1994 .
[15] A. Sangiovanni-Vincentelli,et al. Algorithms For Drift-diff-usion Device Simulation Using Massively Parallel Processors , 1993, [Proceedings] 1993 International Workshop on VLSI Process and Device Modeling (1993 VPAD).
[16] Shirley Dex,et al. JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .
[17] M. M. Stabrowski. A block equation solver for large unsymmetric linear equation systems with dense coefficient matrices , 1987 .
[18] I. Duff,et al. The effect of ordering on preconditioned conjugate gradients , 1989 .
[19] Gilles Cantin. An equation solver of very large capacity , 1971 .
[20] Satish Rao,et al. Shallow excluded minors and improved graph decompositions , 1994, SODA '94.
[21] David A. Patterson,et al. A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1994, TOCS.
[22] Reinhold Weicker,et al. A detailed look at some popular benchmarks , 1991, Parallel Comput..
[23] David A. Patterson,et al. A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1993, SIGMETRICS '93.
[24] W. M. Gentleman,et al. Fast Fourier Transforms: for fun and profit , 1966, AFIPS '66 (Fall).
[25] Gary L. Miller,et al. Separators in two and three dimensions , 1990, STOC '90.
[26] Richard E. Twogood,et al. An Extension of Eklundh's Matrix Transposition Algorithm and Its Application in Digital Image Processing , 1976, IEEE Transactions on Computers.
[27] J. E. Kelley. An Application of Linear Programming to Curve Fitting , 1958 .
[28] Guy L. Steele,et al. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines , 1990, J. Parallel Distributed Comput..
[29] Joseph W. H. Liu,et al. On the storage requirement in the out-of-core multifrontal method for sparse factorization , 1986, TOMS.
[30] Y. Saad,et al. Practical Use of Polynomial Preconditionings for the Conjugate Gradient Method , 1985 .
[31] Thomas Fahringer,et al. A static parameter based performance prediction tool for parallel programs , 1993, ICS '93.
[32] John Noye,et al. Finite Difference Techniques for Partial Differential Equations , 1984 .
[33] Jan Mandel,et al. An iterative solver for p-version finite elements in three dimensions , 1994 .
[34] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[35] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[36] D. W. Barron,et al. Solution of Simultaneous Linear Equations using a Magnetic-Tape Store , 1960, Computer/law journal.
[37] Subhash Saini,et al. NAS Parallel Benchmarks Results 3-95 , 1995 .
[38] Michael T. Heath,et al. Solution of Large-Scale Sparse Least Squares Problems Using Auxiliary Storage , 1981 .
[39] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .
[40] Saul Rosen,et al. Electronic Computers: A Historical Survey , 1969, CSUR.
[41] Peter Ming-Chien Chen. Input-output performance evaluation: self-scaling benchmarks, predicted performance , 1992 .
[42] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[43] C.-C. Jay Kuo,et al. Two-Color Fourier Analysis of Iterative Algorithms for Elliptic Problems with Red/Black Ordering , 1990, SIAM J. Sci. Comput..
[44] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[45] Eric A. Brewer,et al. High-level optimization via automated statistical modeling , 1995, PPOPP '95.
[46] Donald MacKenzie,et al. The Influence of the Los Alamos and Livermore National Laboratories on the Development of Supercomputing , 1991, Annals of the History of Computing.
[47] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[48] H. A. Van Der Vorst,et al. M) ICCG for 2D problems on vectorcomputers , 1987 .
[49] Marina C. Chen,et al. The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..
[50] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[51] Eric A. Brewer,et al. Portable high-performance superconducting: high-level platform-dependent optimization , 1994 .
[52] P. J. Denning. QUEUEING MODELS FOR FILE MEMORY OPERATION , 1965 .
[53] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[54] J. Demmel. Numerical linear algebra , 1993 .
[55] Stanley C. Eisenstat,et al. Software for Sparse Gaussian Elimination with Limited Core Storage. , 1978 .
[56] A. George,et al. Auxiliary Storage Methods for Solving Finite Element Systems , 1985 .
[57] Joseph W. H. Liu,et al. The multifrontal method and paging in sparse Cholesky factorization , 1989, TOMS.
[58] David H. Bailey,et al. FFTs in external or hierarchical memory , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[59] Ken Kennedy,et al. A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.
[60] U. Schumann,et al. Comments on "A Fast Computer Method for Matrix Transposing" and Application to the Solution of Poisson's Equation , 1973, IEEE Trans. Computers.
[61] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[62] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[63] Roland W. Freund,et al. On Adaptive Weighted Polynomial Preconditioning for Hermitian Positive Definite Matrices , 1994, SIAM J. Sci. Comput..
[64] Philip H. Dorn,et al. The Soul of a New Machine , 1982, Annals of the History of Computing.
[65] Klaus-Jürgen Bathe,et al. Direct solution of large systems of linear equations , 1974 .
[66] Horst D. Simon,et al. Solution of large, dense symmetric generalized eigenvalue problems using secondary storage , 1988, TOMS.
[67] Guy E. Blelloch,et al. Scan primitives and parallel vector models , 1989 .
[68] Petter E. Bjørstad,et al. A large scale, sparse, secondary storage, direct linear equation solver for structural analysis and its implementation on vector and parallel architectures , 1987, Parallel Comput..
[69] H. T. Kung. Memory requirements for balanced computer architectures , 1986, ISCA '86.
[70] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[71] R. C. Malone,et al. Parallel ocean general circulation modeling , 1992 .
[72] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.
[73] M. M. Stabrowski. A block equation solver for large unsymmetric matrices arising in the boundary integral equation method , 1985 .
[74] Katta G. Murty,et al. Linear complementarity, linear and nonlinear programming , 1988 .
[75] L. J. Comrie,et al. Mathematical Tables and Other Aids to Computation. , 1946 .
[76] N. B. MacDonald. Predicting Execution Times of Sequential Scientific Kernels , 1994, Automatic Parallelization.
[77] Bernd Fischer Roland W. Freund. An Inner Product-Free Conjugate Gradient-Like Algorithm for Hermitian Positive Definite Systems , 1994 .
[78] Satish Rao,et al. New graph decompositions and fast emulations in hypercubes and butterflies , 1993, SPAA '93.
[79] Joseph W. H. Liu,et al. The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..
[80] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[81] Baruch Awerbuch,et al. Sparse partitions , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.
[82] A. George. Nested Dissection of a Regular Finite Element Mesh , 1973 .
[83] Elizabeth H. Cuthill,et al. Digital Computers in Nuclear Reactor Design , 1964, Adv. Comput..
[84] John K. Reid,et al. Solving Large Full Sets of Linear Equations in a Paged Virtual Store , 1981, TOMS.
[85] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[86] G. Golub,et al. Iterative solution of linear systems , 1991, Acta Numerica.
[87] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[88] Sivan Toledo,et al. Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers , 1997, J. Comput. Syst. Sci..
[89] Daniel A. Reed,et al. Performance observability , 1990 .
[90] Graham H. Powell,et al. Large capacity equation solver for structural analysis , 1974 .
[91] A. L. Scherr,et al. AN ANALYSIS OF TIME-SHARED COMPUTER SYSTEMS , 1965 .
[92] R. Grimes,et al. On vectorizing incomplete factorization and SSOR preconditioners , 1988 .
[93] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[94] Guy E. Blelloch,et al. NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .
[95] Guy E. Blelloch,et al. Implementation of a portable nested data-parallel language , 1993, PPOPP '93.
[96] Sharon E. Perl. Performance assertion checking , 1993, SOSP '93.
[97] I. Gustafsson. A class of first order factorization methods , 1978 .
[98] Edward D. Lazowska,et al. Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.
[99] R. Singleton,et al. A method for computing the fast Fourier transform with auxiliary memory and limited high-speed storage , 1967, IEEE Transactions on Audio and Electroacoustics.
[100] James V. Beck,et al. Parameter Estimation in Engineering and Science , 1977 .
[101] William Orchard-Hays,et al. Advanced Linear-Programming Computing Techniques , 1968 .
[102] Edward D. Lazowska,et al. Quantitative System Performance , 1985, Int. CMG Conference.
[103] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .
[104] William L. Briggs,et al. A multigrid tutorial , 1987 .
[105] Dennis Gannon,et al. Building analytical models into an interactive performance prediction tool , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[106] Kevin J. M. Moriarty,et al. A Modified Conjugate Gradient Solver for Very Large Systems , 1985, ICPP.