Deep Neural Networks with Short Circuits for Improved Gradient Learning

Deep neural networks have achieved great success both in computer vision and natural language processing tasks. However, mostly state-of-art methods highly rely on external training or computing to improve the performance. To alleviate the external reliance, we proposed a gradient enhancement approach, conducted by the short circuit neural connections, to improve the gradient learning of deep neural networks. The proposed short circuit is a unidirectional connection that single back propagates the sensitive from the deep layer to the shallows. Moreover, the short circuit formulates to be a gradient truncation of its crossing layers which can plug into the backbone deep neural networks without introducing external training parameters. Extensive experiments demonstrate deep neural networks with our short circuit gain a large margin over the baselines on both computer vision and natural language processing tasks.

[1]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[2]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[3]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[4]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[5]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[6]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[7]  Shiyu Chang,et al.  A Co-Matching Model for Multi-choice Reading Comprehension , 2018, ACL.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  John C. Henderson,et al.  MITRE at SemEval-2018 Task 11: Commonsense Reasoning without Commonsense Knowledge , 2018, *SEMEVAL.

[10]  Jason Weston,et al.  Finding Generalizable Evidence by Learning to Convince Q&A Models , 2019, EMNLP.

[11]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[12]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[13]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[16]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[19]  Claire Cardie,et al.  Improving Machine Reading Comprehension with General Reading Strategies , 2018, NAACL.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Yi Pan,et al.  Fast Deep Learning Training through Intelligently Freezing Layers , 2019, 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).

[22]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[23]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[24]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yu Cheng,et al.  FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.

[27]  Bo Zhang,et al.  Noisy Differentiable Architecture Search , 2020, ArXiv.

[28]  Wei Zhao,et al.  Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension , 2018, SemEval@NAACL-HLT.

[29]  Dilek Z. Hakkani-Tür,et al.  MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension , 2020, AAAI.

[30]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[31]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[32]  Di Jin,et al.  Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering , 2020, ACL.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Nicole Gruber,et al.  Are GRU Cells More Specific and LSTM Cells More Sensitive in Motive Classification of Text? , 2020, Frontiers in Artificial Intelligence.

[35]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[38]  Claire Cardie,et al.  Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension , 2019, ArXiv.

[39]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[40]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[42]  Ming Yan,et al.  Incorporating Relation Knowledge into Commonsense Reading Comprehension with Multi-task Learning , 2019, CIKM.

[43]  Yi Pan,et al.  Convolutional networks with cross-layer neurons for image recognition , 2018, Inf. Sci..

[44]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yi Pan,et al.  Gradient Amplification: An efficient way to train deep neural networks , 2020, Big Data Min. Anal..

[46]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Claire Cardie,et al.  Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension , 2019, Transactions of the Association for Computational Linguistics.

[48]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.