Training neural networks by marginalizing out hidden layer noise

The generalization ability of neural networks is influenced by the size of the training set. The training process for single-hidden-layer feedforward neural networks (SLFNs) consists of two stages: nonlinear feature mapping and predictor optimization in the hidden layer space. In this paper, we propose a new approach, called marginalizing out hidden layer noise (MHLN), in which the predictor of SLFNs is trained with infinite samples. First, MHLN augments the training set in the hidden layer space with constrained samples, which are generated by corrupting the hidden layer outputs of the training set with given noise. For any given training sample, when the number of corruptions is close to infinity, according to the weak law of large numbers, the explicitly generated constrained samples can be replaced with their expectations. In this way, the training set is implicitly extended in the hidden layer space by an infinite number of constrained samples. Then, MHLN constructs the predictor of SLFNs by optimizing the expected value of a quadratic loss function under the given noise distribution. The results of experiments on twenty benchmark datasets show that MHLN achieves better generalization ability.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Jian Pei,et al.  Distance metric learning using dropout: a structured regularization approach , 2014, KDD.

[3]  Yanjun Li,et al.  Neural Networks with Marginalized Corrupted Hidden Layer , 2015, ICONIP.

[4]  Hao Yu,et al.  Neural Network Learning Without Backpropagation , 2010, IEEE Transactions on Neural Networks.

[5]  Alexander J. Smola,et al.  Convex Learning with Invariances , 2007, NIPS.

[6]  Lei Chen,et al.  Enhanced random search based incremental extreme learning machine , 2008, Neurocomputing.

[7]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  J. Urgen Branke Evolutionary Algorithms for Neural Network Design and Training , 1995 .

[11]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[12]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[13]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[14]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[15]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[16]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[17]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[18]  Michael R. Lyu,et al.  A pseudoinverse learning algorithm for feedforward neural networks with stacked generalization applications to software reliability growth data , 2004, Neurocomputing.

[19]  Bernhard Schölkopf,et al.  Improving the accuracy and speed of support vector learning machines , 1997, NIPS 1997.

[20]  Thore Graepel,et al.  Invariant Pattern Recognition by Semi-Definite Programming Machines , 2003, NIPS.

[21]  Paul Lamere,et al.  Steerable Playlist Generation by Learning Song Similarity from Radio Station Playlists , 2009, ISMIR.

[22]  Changchun Bao,et al.  Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification , 2014, Speech Commun..

[23]  Shifei Ding,et al.  Extreme learning machine and its applications , 2013, Neural Computing and Applications.

[24]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[25]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[26]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[27]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[28]  Zenglin Xu,et al.  Learning with Marginalized Corrupted Features and Labels Together , 2016, AAAI.

[29]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[30]  Guang-Bin Huang,et al.  Convex incremental extreme learning machine , 2007, Neurocomputing.

[31]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[32]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[33]  David G. Stork,et al.  Pattern Classification , 1973 .