CATERPILLAR: Coarse Grain Reconfigurable Architecture for accelerating the training of Deep Neural Networks
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.
[2] Robert A. van de Geijn,et al. Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures , 2012, IEEE Transactions on Computers.
[3] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[4] Alex Graves,et al. Memory-Efficient Backpropagation Through Time , 2016, NIPS.
[5] Luca Maria Gambardella,et al. Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.
[6] Bernard Widrow,et al. 30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.
[7] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.
[8] Joel Z. Leibo,et al. How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.
[9] R.G. Girones,et al. Systolic implementation of a pipelined on-line backpropagation , 1999, Proceedings of the Seventh International Conference on Microelectronics for Neural, Fuzzy and Bio-Inspired Systems.
[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[14] Wonyong Sung,et al. FPGA based implementation of deep neural networks using on-chip memory only , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Stephen Richardson,et al. Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era , 2016, IEEE Design & Test.
[16] Robert A. van de Geijn,et al. Floating Point Architecture Extensions for Optimized Matrix Factorization , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[17] Javier D. Bruguera,et al. Faithful powering computation using table look-up and a fused accumulation tree , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.
[18] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[19] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Sankar K. Pal,et al. Neural Networks and Systolic Array Design , 2002 .
[21] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[23] Mark Horowitz,et al. FPU Generator for Design Space Exploration , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[24] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[25] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[26] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[27] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[28] Michael James,et al. Continuous Propagation: Layer-Parallel Training , 2018 .
[29] Robert A. van de Geijn,et al. A high-performance, low-power linear algebra core , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.
[30] Jun Cao,et al. High-performance hardware for function generation , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.
[31] Srihari Cadambi,et al. A Massively Parallel Digital Learning Processor , 2008, NIPS.
[32] Daniel Cownden,et al. Random feedback weights support learning in deep neural networks , 2014, ArXiv.
[33] Arild Nøkland,et al. Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.