Semisupervised Learning of Classifiers With Application to Human -Computer Interaction

With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, human-computer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good interaction between humans and machines. One basic task, related to many such applications, is automatic classification by the machine. Designing a classifier can be done by domain experts or by learning from training data. Training data can be labeled to the different classes or unlabeled. In this work I focus on training probabilistic classifiers with labeled and unlabeled data. I show under what conditions unlabeled data can be used to improve classification performance. I also show that it often occurs that if the conditions are violated, using unlabeled data can be detrimental to the classification performance. I discuss the implications of this analysis when learning a specific type of probabilistic classifiers, namely Bayesian networks, and propose structure learning algorithms that can potentially utilize unlabeled data to improve classification. I show how the theory and algorithms are successfully applied in two applications related to human-computer interaction: facial expression recognition and face detection.

[1]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[2]  David Matsumoto,et al.  Cultural Influences on Judgments of Facial Expressions of Emotion (特集テーマ・顔・表情・ジェスチャの認識・合成) -- (表情) , 1999 .

[3]  Henry Stark,et al.  Probability, Random Processes, and Estimation Theory for Engineers , 1995 .

[4]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[5]  Dan Roth,et al.  Learning in Natural Language , 1999, IJCAI.

[6]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[7]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[8]  C. Izard Innate and universal facial expressions: evidence from developmental and cross-cultural research. , 1994, Psychological bulletin.

[9]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[10]  Thomas S. Huang,et al.  Generative and discriminative face modelling for detection , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[11]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[12]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[13]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[14]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[15]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[16]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Larry S. Davis,et al.  Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Gareth James,et al.  Variance and Bias for General Loss Functions , 2003, Machine Learning.

[19]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[20]  J. Cacioppo,et al.  Inferring psychological significance from physiological signals. , 1990, The American psychologist.

[21]  Nicu Sebe,et al.  Emotion recognition using a Cauchy Naive Bayes classifier , 2002, Object recognition supported by user interaction for service robots.

[22]  Rayid Ghani,et al.  Combining Labeled and Unlabeled Data for MultiClass Text Categorization , 2002, ICML.

[23]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[24]  Rémi Gilleron,et al.  Positive and Unlabeled Examples Help Learning , 1999, ALT.

[25]  Jun Ohya,et al.  Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences , 1997, Proceedings of International Conference on Image Processing.

[26]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[27]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Thomas S. Huang,et al.  Connected vibrations: a modal analysis approach for non-rigid motion tracking , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[29]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[30]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[31]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[32]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Marian Stewart Bartlett,et al.  Classifying Facial Actions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[35]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[36]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[37]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[38]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[39]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[40]  D. Hosmer A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions Under Three Different Types of Sample , 1973 .

[41]  G. McLachlan,et al.  The efficiency of a linear discriminant function based on unclassified initial samples , 1978 .

[42]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[43]  Thomas S. Huang,et al.  Facial Expression Recognition from Video Sequences : Temporal and Static Modelling , 2002 .

[44]  Nicu Sebe,et al.  Evaluation of Expression Recognition Techniques , 2003, CIVR.

[45]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[46]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  A P Dawid,et al.  Properties of diagnostic data distributions. , 1976, Biometrics.

[48]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[49]  Dan Roth,et al.  Understanding Probabilistic Classifiers , 2001, ECML.

[50]  Nicu Sebe,et al.  Facial expression recognition from video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[51]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[52]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[53]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[54]  P. Lachenbruch,et al.  Discriminant Analysis When Scale Contamination Is Present in the Initial Sample , 1977 .

[55]  Shigeo Morishima,et al.  Expression analysis/synthesis system based on emotion space constructed by multilayered neural network , 1994 .

[56]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[57]  Russell Greiner,et al.  Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[58]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[59]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[60]  Santosh S. Venkatesh,et al.  Learning from a mixture of labeled and unlabeled examples with parametric side information , 1995, COLT '95.

[61]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[62]  Narendra Ahuja,et al.  A SNoW-Based Face Detector , 1999, NIPS.

[63]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[64]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[65]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[66]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[67]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  David B. Cooper,et al.  On the Asymptotic Improvement in the Out- come of Supervised Learning Provided by Additional Nonsupervised Learning , 1970, IEEE Transactions on Computers.

[69]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[70]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[71]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[72]  T. Cover,et al.  The relative value of labeled and unlabeled samples in pattern recognition , 1993, Proceedings. IEEE International Symposium on Information Theory.

[73]  Alex Pentland,et al.  LAFTER: a real-time face and lips tracker with facial expression recognition , 2000, Pattern Recognit..

[74]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[75]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[76]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[77]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[78]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[79]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[80]  R. Berk,et al.  Limiting Behavior of Posterior Distributions when the Model is Incorrect , 1966 .

[81]  J. Lien,et al.  Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity , 1998 .

[82]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[83]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[84]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.