Sparse Matrix Vector Processing Formats
暂无分享,去创建一个
[1] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[2] Roman Geus,et al. Towards a fast parallel sparse matrix-vector multiplication , 2000, PARCO.
[3] Hiroshi Okuda,et al. Performance Optimization of GeoFEM Fluid Analysis Code on Various Computer Architectures , 2002 .
[4] Ronald F. Boisvert,et al. Developing numerical libraries in Java , 1998, Concurr. Pract. Exp..
[5] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[6] Ernst Schrem,et al. Computer Implementation of the Finite-Element Procedure , 1973 .
[7] A. Lumsdaine,et al. A Sparse Matrix Library in C + + for High PerformanceArchitectures , 1994 .
[8] Stamatis Vassiliadis,et al. Architectural Support for 3D Graphics in the Complex Streamed Instruction Set , 2002, IASTED PDCS.
[9] Stamatis Vassiliadis,et al. The MOLEN ρμ-coded processor , 2001 .
[10] John Wawrzynek,et al. Vector microprocessors , 1998 .
[11] Mateo Valero,et al. Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[12] Stamatis Vassiliadis,et al. The Molen Programming Paradigm , 2004, SAMOS.
[13] Jack J. Dongarra. Performance of various computers using standard linear equations software in a Fortran environment , 1983, CARN.
[14] Yousef Saad,et al. A benchmark package for sparse matrix computations , 1988, ICS '88.
[15] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[16] Stamatis Vassiliadis,et al. The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.
[17] Gerry Kane,et al. MIPS RISC Architecture , 1987 .
[18] Leonid Oliker,et al. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations , 2003, SC.
[19] Mateo Valero,et al. Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[20] Guy E. Blelloch,et al. AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .
[21] James Demmel,et al. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[22] J. Z. Zhu,et al. The finite element method , 1977 .
[23] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[24] Stamatis Vassiliadis. Polymorphic Processors: How to Expose Arbitrary Hardware Functionality to Programmers , 2004, PACT 2004.
[25] Alexandru Nicolau,et al. Computing Programs Containing Band Linear Recurrences on Vector Supercomputers , 1996, IEEE Trans. Parallel Distributed Syst..
[26] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[27] Iain S. Duff,et al. Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .
[28] Geoffrey C. Fox,et al. The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..
[29] Y. Saad,et al. Numerical solution of large nonsymmetric eigenvalue problems , 1989 .
[30] Stamatis Vassiliadis,et al. The MOLEN processor prototype , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[31] Stamatis Vassiliadis,et al. A Hierarchical sparse matrix storage format for vector processors , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[32] David H. Bailey,et al. NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.
[33] Krste Asanovic,et al. Torrent Architecture Manual , 1997 .
[34] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[35] Mateo Valero,et al. Out-of-order vector architectures , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[36] John Wawrzynek,et al. T0: A Single-Chip Vector Microprocessor with Reconfigurable Pipelines , 1996, ESSCIRC '96: Proceedings of the 22nd European Solid-State Circuits Conference.
[37] Werner Buchholz. The IBM System/370 Vector Architecture , 1986, IBM Syst. J..
[38] Peter M. Kogge,et al. The Architecture of Pipelined Computers , 1981 .
[39] Stamatis Vassiliadis,et al. Performance of the Complex Streamed Instruction Set on Image Processing Kernels , 2001, Euro-Par.
[40] Stamatis Vassiliadis,et al. Sparse matrix transpose unit , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[41] Stamatis Vassiliadis,et al. Performance Scalability of Multimedia Instruction Set Extensions , 2002, Euro-Par.
[42] Stamatis Vassiliadis,et al. Implementation and evaluation of the Complex Streamed Instruction set , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[43] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[44] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[45] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[46] Victor Eijkhout,et al. LAPACK Working Note 50: Distributed Sparse Data Structures for Linear Algebra Operations , 1992 .
[47] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[48] Stamatis Vassiliadis,et al. Implementation of a streaming execution unit , 2002, Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools.
[49] Stamatis Vassiliadis,et al. Block Based Compression Storage Expected Performance , 2002 .
[50] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[51] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[52] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[53] W. J. Watson. The TI ASC: a highly modular and flexible super computer architecture , 1972, AFIPS '72 (Fall, part I).
[54] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[55] J.S.S.M. Wong,et al. Microcoded Reconfigurable Embedded Processors , 2002 .
[56] Brian B. Moore,et al. The IBM System/370 Vector Architecture: Design Considerations , 1988, IEEE Trans. Computers.
[57] Wai-Mee Ching,et al. Sparse matrix technology tools in APL , 1990 .
[58] Kesheng Wu,et al. A Revised Proposal for a Sparse BLAS Toolkit , 1994 .
[59] Yousef Saad,et al. SPARK: a benchmark package for sparse computations , 1990, ICS '90.
[60] Stamatis Vassiliadis,et al. D-SAB: A Sparse Matrix Benchmark Suite , 2003, PaCT.
[61] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[62] A. Pinar,et al. Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).