Augmented Efficient BackProp for backpropagation learning in deep autoassociative neural networks

We introduce Augmented Efficient BackProp as a strategy for applying the backpropagation algorithm to deep autoencoders, i.e., autoassociators with many hidden layers, without relying on a weight initialization using restricted Boltzmann machines. This training method is an extension of Efficient BackProp, first proposed by LeCun et al. [1], and is benchmarked on three different types of application datasets.

[1]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[2]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[3]  E. C. Malthouse,et al.  Limitations of nonlinear PCA as performed with generic neural networks , 1998, IEEE Trans. Neural Networks.

[4]  Adam H. Monahan,et al.  Nonlinear Principal Component Analysis by Neural Networks: Theory and Application to the Lorenz System , 2000 .

[5]  Chikkannan Eswaran,et al.  Performance Comparison of Three Types of Autoencoder Neural Networks , 2008, 2008 Second Asia International Conference on Modelling & Simulation (AMS).

[6]  Miguel Á. Carreira-Perpiñán,et al.  A Review of Dimension Reduction Techniques , 2009 .

[7]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[8]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[9]  Erkki Oja,et al.  The nonlinear PCA learning rule in independent component analysis , 1997, Neurocomputing.

[10]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[11]  Bin-Da Liu,et al.  A backpropagation algorithm with adaptive learning rate and momentum coefficient , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[12]  Mark A. Kramer,et al.  Autoassociative neural networks , 1992 .

[13]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[14]  Juha Karhunen,et al.  Generalizations of principal component analysis, optimization problems, and neural networks , 1995, Neural Networks.

[15]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[16]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[17]  Erkki Oja,et al.  A comparison of neural ICA algorithms using real-world data , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[18]  D L Massart,et al.  A journey into low-dimensional spaces with autoassociative neural networks. , 2003, Talanta.

[19]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, Journal of Chemical Information and Modeling.

[20]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[21]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[22]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[23]  Erkki Oja,et al.  A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[24]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[25]  Its'hak Dinstein,et al.  A comparative study of neural network based feature extraction paradigms , 1999, Pattern Recognit. Lett..

[26]  Lipo Wang,et al.  Back-propagation with chaos , 2008, 2008 International Conference on Neural Networks and Signal Processing.

[27]  Matthias Scholz,et al.  Bioinformatics Original Paper Non-linear Pca: a Missing Data Approach , 2022 .

[28]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[29]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[30]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[31]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[32]  Nathalie Japkowicz,et al.  Nonlinear Autoassociation Is Not Equivalent to PCA , 2000, Neural Computation.

[33]  Mark A. Kramer,et al.  Improvement of the backpropagation algorithm for training neural networks , 1990 .

[34]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[35]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[36]  Its'hak Dinstein,et al.  Feature extraction by neural network nonlinear mapping for pattern classification , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[37]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[38]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[39]  James V. Stone,et al.  On the relative time complexities of standard and conjugate gradient backpropagation , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[40]  Ru-Qin Yu,et al.  Neural network learning to non-linear principal component analysis , 1996 .

[41]  E. Oja From neural learning to independent components , 1998, Neurocomputing.

[42]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[43]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.