MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA

[1]  Veljko M. Milutinovic,et al.  FPGA accelerator for floating-point matrix multiplication , 2012, IET Comput. Digit. Tech..

[2]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[7]  Junzhong Shen,et al.  Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[8]  Siddharth Joshi,et al.  FPGA Based High Performance Double-Precision Matrix Multiplication , 2009, VLSI Design.

[9]  Viktor K. Prasanna,et al.  Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[10]  Junzhong Shen,et al.  FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency , 2017, Concurr. Comput. Pract. Exp..

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).