Deep Neural Network Self-training Based on Unsupervised Learning and Dropout

In supervised learning methods, a large amount of labeled data is necessary to find reliable classification boundaries to train a classifier. However, it is hard to obtain a large amount of labeled data in practice and it is time-consuming with a lot of cost to obtain labels of data. Although unlabeled data is comparatively plentiful than labeled data, most of supervised learning methods are not designed to exploit unlabeled data. Self-training is one of the semisupervised learning methods that alternatively repeat training a base classifier and labeling unlabeled data in training set. Most self-training methods have adopted confidence measures to select confidently labeled examples because high-confidence usually implies low error. A major difficulty of self-training is the error amplification. If a classifier misclassifies some examples and the misclassified examples are included in the labeled training set, the next classifier may learn improper classification boundaries and generate more misclassified examples. Since base classifiers are built with small labeled dataset and are hard to earn good generalization performance due to the small labeled dataset. Although improving training procedure and the performance of classifiers, error occurrence is inevitable, so corrections of self-labeled data are necessary to avoid error amplification in the following classifiers. In this paper, we propose a deep neural network based approach for alleviating the problems of self-training by combining schemes: pre-training, dropout and error forgetting. By applying combinations of these schemes to various dataset, a trained classifier using our approach shows improved performance than trained classifier using common self-training.

[1]  Hamideh Afsarmanesh,et al.  Semi-supervised self-training for decision tree classifiers , 2017, Int. J. Mach. Learn. Cybern..

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  Dinggang Shen,et al.  Robust Deep Learning for Improved Classification of AD/MCI Patients , 2014, MLMI.

[4]  Uwe Mönks,et al.  Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction , 2013, 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA).

[5]  Jie Li,et al.  Understanding the dropout strategy and analyzing its effectiveness on LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[9]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[10]  Volkmar Frinken,et al.  Evaluating Retraining Rules for Semi-Supervised Learning in Neural Network Based Cursive Word Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[11]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[12]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[13]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[14]  Alexander Zien,et al.  Semi-Supervised Text Classification Using EM , 2006 .

[15]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[18]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[19]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[20]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[21]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.