论文信息 - MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA - 字舞流文

MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA

Junzhong Shen | You Huang | Yuran Qiao | Mei Wen | Chunyuan Zhang

[1] Veljko M. Milutinovic,et al. FPGA accelerator for floating-point matrix multiplication , 2012, IET Comput. Digit. Tech..

[2] Yong Dou,et al. 64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jason Cong,et al. Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[7] Junzhong Shen,et al. Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[8] Siddharth Joshi,et al. FPGA Based High Performance Double-Precision Matrix Multiplication , 2009, VLSI Design.

[9] Viktor K. Prasanna,et al. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10] Junzhong Shen,et al. FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency , 2017, Concurr. Comput. Pract. Exp..

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).