A Linear Algebra Core Design for Efficient Level-3 BLAS
暂无分享,去创建一个
Robert A. van de Geijn | Andreas Gerstlauer | Nam Sung Kim | Michael J. Schulte | Ardavan Pedram | Syed Zohaib Gilani | R. V. D. Geijn | A. Gerstlauer | M. Schulte | N. Kim | A. Pedram | S. Gilani
[1] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[2] Sanjay J. Patel,et al. Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.
[3] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Robert A. van de Geijn,et al. A high-performance, low-power linear algebra core , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.
[5] Ali R. Hurson,et al. General-purpose systolic arrays , 1993, Computer.
[6] Norman P. Jouppi,et al. Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.
[7] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[8] Viktor K. Prasanna,et al. High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.
[9] Nam Sung Kim,et al. Energy-efficient floating-point arithmetic for software-defined radio architectures , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.
[10] Viktor K. Prasanna,et al. Energy- and time-efficient matrix multiplication on FPGAs , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[11] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[12] Michael Parker. High-performance floating-point implementation using FPGAS , 2009, MILCOM 2009 - 2009 IEEE Military Communications Conference.