BLAS Comparison on FPGA, CPU and GPU
暂无分享,去创建一个
[1] Florent de Dinechin,et al. An FPGA-specific approach to floating-point accumulation and sum-of-products , 2008, 2008 International Conference on Field-Programmable Technology.
[2] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[3] Mi Lu,et al. Group-Alignment based Accurate Floating-Point Summation on FPGAs , 2006, ERSA.
[4] Viktor K. Prasanna,et al. Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[5] Siddharth Joshi,et al. FPGA Based High Performance Double-Precision Matrix Multiplication , 2009, 2009 22nd International Conference on VLSI Design.
[6] Robert A. van de Geijn,et al. BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.
[7] Viktor K. Prasanna,et al. High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs , 2007, IEEE Transactions on Parallel and Distributed Systems.
[8] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[9] Chen Chang,et al. BEE3: Revitalizing Computer Architecture Research , 2009 .
[10] Karl S. Hemmert,et al. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[11] Viktor K. Prasanna,et al. Scalable hybrid designs for linear algebra on reconfigurable computing systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[12] Margaret Martonosi,et al. Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.
[13] Martin C. Herbordt,et al. Effective Floating Point Applications on FPGAs : Examples from Molecular Modeling ∗ , 2009 .
[14] White Paper FPGA Coprocessing Evolution : Sustained Performance Approaches Peak Performance , 1998 .
[15] Viktor K. Prasanna,et al. Hardware/Software Co-Design for Matrix Computations on Reconfigurable Computing Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[16] Viktor K. Prasanna,et al. Designing scalable FPGA-based reduction circuits using pipelined floating-point cores , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[17] Leonid Oliker. Green flash: Designing an energy efficient climate supercomputer , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[18] Sadaf R. Alam,et al. Scientific Computing Beyond CPUs: FPGA implementations of common scientific kernels , 2005 .
[19] Viktor K. Prasanna,et al. High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).