Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction

Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that, if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition: facial expression recognition and face detection.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  David B. Cooper,et al.  On the Asymptotic Improvement in the Out- come of Supervised Learning Provided by Additional Nonsupervised Learning , 1970, IEEE Transactions on Computers.

[3]  D. Hosmer A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions Under Three Different Types of Sample , 1973 .

[4]  P. Lachenbruch,et al.  Discriminant Analysis When Scale Contamination Is Present in the Initial Sample , 1977 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[7]  G. McLachlan,et al.  The efficiency of a linear discriminant function based on unclassified initial samples , 1978 .

[8]  C. B. Chittineni Learning with imperfectly labeled patterns , 1980, Pattern Recognit..

[9]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[10]  R. Chhikara,et al.  Linear discriminant analysis with misallocation in training samples , 1984 .

[11]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  Subhas C. Nandy,et al.  Efficiency of discriminant analysis when initial samples are classified stochastically , 1990, Pattern Recognit..

[14]  Subhas C. Nandy,et al.  Efficiency of logistic-normal stochastic supervision , 1990, Pattern Recognit..

[15]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[16]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[17]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[18]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[19]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[20]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[21]  Santosh S. Venkatesh,et al.  Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[22]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[23]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[24]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[25]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[26]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[27]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[28]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[29]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[30]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[33]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[34]  R. Greiner,et al.  Comparing Bayesian Network Classifiers , 1999, UAI.

[35]  Rémi Gilleron,et al.  Positive and Unlabeled Examples Help Learning , 1999, ALT.

[36]  Dan Roth,et al.  Learning in Natural Language , 1999, IJCAI.

[37]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[39]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[40]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[41]  Russell Greiner,et al.  Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[42]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[43]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[44]  Sankar K. Pal,et al.  Pattern Recognition: From Classical to Modern Approaches , 2001 .

[45]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[46]  Eric Horvitz,et al.  Layered representations for human activity recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[47]  Nicu Sebe,et al.  Facial expression recognition from video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[48]  Adrian Corduneanu,et al.  Continuation Methods for Mixing Heterogenous Sources , 2002, UAI.

[49]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[50]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Thomas S. Huang,et al.  Semisupervised Learning of Classifiers With Application to Human -Computer Interaction , 2003 .

[52]  Vladimir Pavlovic,et al.  Boosted learning in dynamic Bayesian networks for multimodal speaker detection , 2003, Proc. IEEE.

[53]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[54]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[55]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[56]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[57]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[58]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.