FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters
暂无分享,去创建一个
Forrest N. Iandola | Matthew W. Moskewicz | Kurt Keutzer | Khalid Ashraf | K. Keutzer | M. Moskewicz | Khalid Ashraf | F. Iandola
[1] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[2] Forrest N. Iandola,et al. Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling , 2015, ICMR.
[3] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Marc'Aurelio Ranzato,et al. Multi-GPU Training of ConvNets , 2013, ICLR.
[5] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[6] Jian Sun,et al. Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Graham W. Taylor,et al. Theano-based Large-Scale Visual Recognition with Multiple GPUs , 2014, ICLR.
[8] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.
[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[10] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[11] Forrest N. Iandola,et al. DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer , 2015, ArXiv.
[12] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[13] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[14] Paolo Costa,et al. Optimizing Network Performance in Distributed Machine Learning , 2015, HotCloud.
[15] Thomas M. Breuel,et al. The Effects of Hyperparameters on SGD Training of Neural Networks , 2015, ArXiv.
[16] Krste Asanovic,et al. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .
[17] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[18] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[19] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[20] Andrew Lavin,et al. maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs , 2015, ArXiv.
[21] Song Han,et al. A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding , 2015 .
[22] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[23] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[24] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[25] Qiang Chen,et al. Network In Network , 2013, ICLR.
[26] Jitendra Malik,et al. Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] S. Goldsack,et al. IN REAL-TIME , 2008 .
[28] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[29] Forrest N. Iandola,et al. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.
[30] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Brian Kingsbury,et al. Spert-II: A Vector Microprocessor System , 1996, Computer.
[32] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[33] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[34] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[35] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Forrest N. Iandola,et al. Communication-minimizing 2D convolution in GPU registers , 2013, 2013 IEEE International Conference on Image Processing.
[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[38] Yi Li,et al. Mariana: Tencent Deep Learning Platform and its Applications , 2014, Proc. VLDB Endow..
[39] Shengen Yan,et al. Deep Image: Scaling up Image Recognition , 2015, ArXiv.
[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[41] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[42] D. H. Mellor,et al. Real time , 1981 .
[43] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[44] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[45] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.