Two-Layer Contractive Encodings with Linear Transformation of Perceptrons for Semi-Supervised Learning

It is difficult to train a multi-layer perceptron (MLP) when there are only a few labeled samples available. However, by pretraining an MLP with vast amount of unlabeled samples available, we may achieve better generalization performance. Schulz et al. (2012) showed that it is possible to pretrain an MLP in a less greedy way by utilizing the two-layer contractive encodings, however, with a cost of a more difficult optimization problem. On the other hand, Raiko et al. (2012) proposed a scheme for making the optimization problem much easier in deep networks. In this paper, we show that it is beneficial to combine these two approaches.

[1]  Sven Behnke,et al.  Learning Two-Layer Contractive Encodings , 2012, ICANN.

[2]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[3]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[7]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[8]  Ulrike von Luxburg,et al.  Proceedings of the 28th International Conference on Machine Learning, ICML 2011 , 2011, International Conference on Machine Learning.

[9]  Tapani Raiko,et al.  Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[14]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[15]  Tapani Raiko,et al.  Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities , 2013, ICLR.

[16]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[17]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[18]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .