A Deeper Look at FFT and Winograd Convolutions
暂无分享,去创建一个
F. Durand | A. Zlateski | Zhen Jia | Kai Li
[1] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[2] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[3] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[5] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[6] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[7] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.
[8] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[9] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[12] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .
[13] H. Sebastian Seung,et al. ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] H. Sebastian Seung,et al. ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[15] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[17] H. Sebastian Seung,et al. Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs , 2017, ICS '17.
[18] Nir Shavit,et al. Deep Tensor Convolution on Multicores , 2016, ICML.
[19] Kevin Vincent,et al. On Improving the Numerical Stability of Winograd Convolutions , 2017, ICLR.
[20] Frédo Durand,et al. Optimizing N-dimensional, winograd-based convolution for manycore CPUs , 2018, PPoPP.