Applying fast matrix multiplication to neural networks

Recent advances in deep neural networks have enabled impressive performance in computer vision, natural language processing, and other fields, yet they remain computationally very intensive to train or use. We consider the use of Winograd's Algorithm for fast matrix multiplication in feedforward neural networks and we find that speedups of 10% -- 30% are possible for fully connected layers in large networks.

[1]  Rio Yokota,et al.  Accelerating Matrix Multiplication in Deep Learning by Using Low-Rank Approximation , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Robert A. van de Geijn,et al.  Implementing Strassen's Algorithm with BLIS , 2016, ArXiv.

[3]  Menachem Adelman,et al.  Faster Neural Network Training with Approximate Tensor Operations , 2018, NeurIPS.

[4]  Robert A. van de Geijn,et al.  Generating Families of Practical Fast Matrix Multiplication Algorithms , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Anima Anandkumar,et al.  StrassenNets: Deep learning with a multiplication budget , 2017, ICML.

[6]  Victor Y. Pan,et al.  Fast Matrix Multiplication and Symbolic Computation , 2016, ArXiv.

[7]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[8]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[9]  P. Sadayappan,et al.  Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs , 2013, 20th Annual International Conference on High Performance Computing.

[10]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[11]  Chenhan D. Yu,et al.  Implementing Strassen's Algorithm with CUTLASS on NVIDIA Volta GPUs , 2018, ArXiv.

[12]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.

[13]  Sartaj Sahni,et al.  Strassen's Matrix Multiplication on GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.