Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication
暂无分享,去创建一个
Mário P. Véstias | Horácio C. Neto | Ana Rita Silva | Wilson M. José | M. Véstias | H. Neto | Wilson Jose
[1] Bishop Brock,et al. Architecting for power management: The IBM® POWER7™ approach , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[2] Martin Berggren,et al. Hybrid differentiation strategies for simulation and analysis of applications in C++ , 2008, TOMS.
[3] Ninghui Sun,et al. Fast implementation of DGEMM on Fermi GPU , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[4] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[5] Gene H. Golub,et al. Scientific computing: an introduction with parallel computing , 1993 .
[6] Dhiraj K. Pradhan,et al. A Routing-Aware ILS Design Technique , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[7] Viktor K. Prasanna,et al. Energy efficient architecture for matrix multiplication on FPGAs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.
[8] Robert A. van de Geijn,et al. Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures , 2012, IEEE Transactions on Computers.
[9] Viktor K. Prasanna,et al. High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.
[10] Brett M. Bode,et al. Performance analysis of memory transfers and GEMM subroutines on NVIDIA Tesla GPU cluster , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[11] Sanjay Ranka,et al. Energy and performance tradeoffs for matrix multiplication on multicore machines , 2012, 2012 International Green Computing Conference (IGCC).
[12] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[13] Philip Heng Wai Leong,et al. A Model for Matrix Multiplication Performance on FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.
[14] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.
[15] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[16] Thorsten Grotker,et al. System Design with SystemC , 2002 .
[17] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[18] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.