Scalable hybrid designs for linear algebra on reconfigurable computing systems
暂无分享,去创建一个
[1] Diederik Verkest,et al. Run-Time Minimization of Reconfiguration Overhead in Dynamically Reconfigurable Systems , 2003, FPL.
[2] Wayne Luk,et al. Efficient Hardware Generation of Random Variates with Arbitrary Distributions , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[3] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.
[4] Robert A. van de Geijn,et al. Parallel implementation of BLAS: general techniques for Level 3 BLAS , 1995, Concurr. Pract. Exp..
[5] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[6] Viktor K. Prasanna,et al. A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.
[7] Viktor K. Prasanna,et al. A Hybrid Approach for Mapping Conjugate Gradient onto an FPGA-Augmented Reconfigurable Supercomputer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[8] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[9] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..
[10] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[11] Title : High-Performance Math Libraries Who says you can ’ t get performance and accuracy for free ? , 2005 .
[12] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.
[13] Viktor K. Prasanna,et al. Cache-Friendly implementations of transitive closure , 2007, IEEE PACT.
[14] Srinivas Katkoori,et al. Power minimization algorithms for LUT-based FPGA technology mapping , 2004, TODE.
[15] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[16] Ramachandran Vaidyanathan,et al. Adaptive image filtering using run-time reconfiguration , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[17] Viktor K. Prasanna,et al. High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[18] Karl S. Hemmert,et al. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[19] Warren J. Gross,et al. Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[20] Viktor K. Prasanna,et al. Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[21] Eric Stahlberg,et al. Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[22] Jack Dongarra,et al. LINPACK Users' Guide , 1987 .
[23] Y. El-Kurdi,et al. Hardware Acceleration for Finite-Element Electromagnetics: Efficient Sparse Matrix Floating-Point Computations With FPGAs , 2007, IEEE Transactions on Magnetics.
[24] Robert K. Brayton,et al. HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform , 2002, Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627).
[25] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..
[26] Viktor K. Prasanna,et al. Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[27] Robert A. van de Geijn,et al. Parallel implementation of BLAS: general techniques for Level 3 BLAS , 1995, Concurrency Practice and Experience.
[28] Jim Stevens,et al. Enabling a Uniform Programming Model Across the Software/Hardware Boundary , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[29] André DeHon,et al. Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.
[30] Neil W. Bergmann,et al. An FPGA network architecture for accelerating 3DES - CBC , 2005, International Conference on Field Programmable Logic and Applications, 2005..
[31] Sartaj Sahni,et al. A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.
[32] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[33] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[34] Uday Bondhugula,et al. Parallel FPGA-based all-pairs shortest-paths in a directed graph , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.