Direct Error-Driven Learning for Deep Neural Networks With Applications to Big Data

In this brief, heterogeneity and noise in big data are shown to increase the generalization error for a traditional learning regime utilized for deep neural networks (deep NNs). To reduce this error, while overcoming the issue of vanishing gradients, a direct error-driven learning (EDL) scheme is proposed. First, to reduce the impact of heterogeneity and data noise, the concept of a neighborhood is introduced. Using this neighborhood, an approximation of generalization error is obtained and an overall error, comprised of learning and the approximate generalization errors, is defined. A novel NN weight-tuning law is obtained through a layer-wise performance measure enabling the direct use of overall error for learning. Additional constraints are introduced into the layer-wise performance measure to guide and improve the learning process in the presence of noisy dimensions. The proposed direct EDL scheme effectively addresses the issue of heterogeneity and noise while mitigating vanishing gradients and noisy dimensions. A comprehensive simulation study is presented where the proposed approach is shown to mitigate the vanishing gradient problem while improving generalization by 6%.

[1]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[2]  Timothy C. G. Fisher,et al.  Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression , 2016, 1610.05448.

[3]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[4]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[5]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[6]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[11]  Jianqing Fan,et al.  Endogeneity in High Dimensions. , 2012, Annals of statistics.

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[14]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[15]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[22]  J. Pearl Detecting Latent Heterogeneity , 2013, Probabilistic and Causal Inference.

[23]  Zhiwen Yu,et al.  Hybrid Adaptive Classifier Ensemble , 2015, IEEE Transactions on Cybernetics.

[24]  F. Liang,et al.  A split‐and‐merge Bayesian variable selection approach for ultrahigh dimensional regression , 2015 .

[25]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[26]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[27]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[28]  David Shan-Hill Wong,et al.  Design and Application of a Variable Selection Method for Multilayer Perceptron Neural Network With LASSO , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[30]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[31]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[32]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[33]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[34]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[35]  Sarangapani Jagannathan,et al.  Mahalanobis Taguchi System (MTS) as a Prognostics Tool for Rolling Element Bearing Failures , 2010 .

[36]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[37]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.