A Semi-Supervised Learning Algorithm for Multi-Layered Perceptrons

We address the issue of learning multi-layered perceptrons (MLPs) in a discriminative, inductive, multiclass, parametric, and semi-supervised fashion. We introduce a novel objective function that, when optimized, simultaneously encourages 1) accuracy on the labeled points, 2) respect for an underlying graph-represented manifold on all points, 3) smoothness via an entropic regularizer of the classifier outputs, and 4) simplicity via an ‘2 regularizer. Our approach provides a simple, elegant, and computationally efficient way to bring the benefits of semi-supervised learning (and what is typically an enormous amount of unlabeled training data) to MLPs, which are one of the most widely used pattern classifiers in practice. Our objective has the property that efficient learning is possible using stochastic gradient descent even on large datasets. Results demonstrate significant improvements compared both to a baseline supervised MLP, and also to a previous non-parametric manifold-regularized reproducing kernel Hilbert space classifier.

[1]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[2]  Lorenzo Rosasco,et al.  Manifold Regularization , 2007 .

[3]  Naonori Ueda,et al.  A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design , 2005, AAAI.

[4]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[5]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[12]  Richard Wright,et al.  The vocal joystick data collection effort and vowel corpus , 2006, INTERSPEECH.

[13]  P. Ladefoged,et al.  The sounds of the world's languages , 1996 .

[14]  Douglas Kline,et al.  Revisiting squared-error and cross-entropy functions for training neural network classifiers , 2005, Neural Computing & Applications.

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[17]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[18]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[19]  Stephen J. Wright,et al.  Dissimilarity in Graph-Based Semi-Supervised Classification , 2007, AISTATS.

[20]  Xiao Li,et al.  Maximum margin learning and adaptation of MLP classifiers , 2005, INTERSPEECH.

[21]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[22]  Xiao Li,et al.  Regularized adaptation: theory, algorithms and applications , 2007 .

[23]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[27]  Domingo Román Montes de Oca,et al.  Keith Johnson (2003): Acoustic & Auditory Phonetics, Blackwell, Oxford 2a edición , 2006 .

[28]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[29]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.