Towards Optimal Winograd Convolution on Manycores
暂无分享,去创建一个
F. Durand | A. Zlateski | Zhen Jia | Kai Li
[1] S. Winograd. Arithmetic complexity of computations , 1980 .
[2] Vijay Madisetti. The Digital Signal Processing Handbook, Second Edition - 3 Volume Set , 2009 .
[3] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[5] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[6] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.
[7] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[8] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[9] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[10] Jimeng Sun,et al. An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] G. Henry,et al. LIBXSMM: A High Performance Library for Small Matrix Multiplications , 2015 .
[12] Kai Li,et al. Full correlation matrix analysis of fMRI data on Intel® Xeon Phi™ coprocessors , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Sebastian Scherer,et al. 3D Convolutional Neural Networks for landing zone detection from LiDAR , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[16] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .
[17] H. Sebastian Seung,et al. ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] H. Sebastian Seung,et al. ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[19] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.
[20] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] H. Sebastian Seung,et al. Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs , 2017, ICS '17.
[23] Daniel Thalmann,et al. 3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Gianni De Fabritiis,et al. DeepSite: protein‐binding site predictor using 3D‐convolutional neural networks , 2017, Bioinform..
[25] Nir Shavit,et al. Deep Tensor Convolution on Multicores , 2016, ICML.
[26] Kevin Vincent,et al. On Improving the Numerical Stability of Winograd Convolutions , 2017, ICLR.
[27] Frédo Durand,et al. Optimizing N-dimensional, winograd-based convolution for manycore CPUs , 2018, PPoPP.