暂无分享,去创建一个
Frédo Durand | Kai Li | Zhen Jia | Aleksandar Zlateski | F. Durand | A. Zlateski | Kai Li | Z. Jia
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[3] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[4] Frédo Durand,et al. Optimizing N-dimensional, winograd-based convolution for manycore CPUs , 2018, PPoPP.
[5] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Nir Shavit,et al. Deep Tensor Convolution on Multicores , 2016, ICML.
[7] H. Sebastian Seung,et al. ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[8] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[9] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[10] Jimeng Sun,et al. An input-adaptive and in-place approach to dense tensor-times-matrix multiply , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[12] H. Sebastian Seung,et al. Compile-time optimized and statically scheduled N-D convnet primitives for multi-core and many-core (Xeon Phi) CPUs , 2017, ICS '17.
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[15] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Victor Y. Pan,et al. How Bad Are Vandermonde Matrices? , 2015, SIAM J. Matrix Anal. Appl..
[19] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[20] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[21] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.
[22] C. K. Yuen,et al. Theory and Application of Digital Signal Processing , 1978, IEEE Transactions on Systems, Man, and Cybernetics.
[23] H. Sebastian Seung,et al. ZNNi: Maximizing the Inference Throughput of 3D Convolutional Networks on CPUs and GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[24] Avinash Sodani,et al. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .
[25] Kevin Vincent,et al. On Improving the Numerical Stability of Winograd Convolutions , 2017, ICLR.
[26] G. Henry,et al. LIBXSMM: A High Performance Library for Small Matrix Multiplications , 2015 .
[27] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[28] M. D. MacLaren. The Art of Computer Programming. Volume 2: Seminumerical Algorithms (Donald E. Knuth) , 1970 .