Cooperative Learning and its Application to Emotion Recognition from Speech

In this paper, we propose a novel method for highly efficient exploitation of unlabeled data-Cooperative Learning. Our approach consists of combining Active Learning and Semi-Supervised Learning techniques, with the aim of reducing the costly effects of human annotation. The core underlying idea of Cooperative Learning is to share the labeling work between human and machine efficiently in such a way that instances predicted with insufficient confidence value are subject to human labeling, and those with high confidence values are machine labeled. We conducted various test runs on two emotion recognition tasks with a variable number of initial supervised training instances and two different feature sets. The results show that Cooperative Learning consistently outperforms individual Active and Semi-Supervised Learning techniques in all test cases. In particular, we show that our method based on the combination of Active Learning and Co-Training leads to the same performance of a model trained on the whole training set, but using 75% fewer labeled instances. Therefore, our method efficiently and robustly reduces the need for human annotations.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Zixing Zhang,et al.  An Agreement and Sparseness-based Learning Instance Selection and its Application to Subjective Speech Phenomena , 2014, LREC 2014.

[4]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Björn W. Schuller,et al.  Co-training succeeds in Computational Paralinguistics , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[8]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[9]  Rosângela Coelho,et al.  Time-Frequency Feature and AMS-GMM Mask for Acoustic Emotion Classification , 2014, IEEE Signal Processing Letters.

[10]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Transactions on Affective Computing.

[11]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Chun Chen,et al.  Speech Emotion Recognition using an Enhanced Co-Training Algorithm , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[14]  Nikos Fakotakis,et al.  Modeling the Temporal Evolution of Acoustic Parameters for Speech Emotion Recognition , 2012, IEEE Transactions on Affective Computing.

[15]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[16]  Honglak Lee,et al.  Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[18]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[19]  Björn W. Schuller,et al.  Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition , 2012, INTERSPEECH.

[20]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[21]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[23]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[24]  Björn Schuller,et al.  The Computational Paralinguistics Challenge , 2012 .

[25]  Johannes Wagner,et al.  Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation , 2008, Affect and Emotion in Human-Computer Interaction.

[26]  Jing Huang,et al.  Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Robert I. Damper,et al.  On Acoustic Emotion Recognition: Compensating for Covariate Shift , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[29]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[30]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[31]  ChunChen,et al.  EMOTION RECOGNITION FROM NOISY SPEECH , 2006 .

[32]  Alexander Rainer Tassilo Gepperth Co-training of context models for real-time vehicle detection , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[33]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[34]  Mohamed Chetouani,et al.  Emotional Speech Classification Based on Multi View Characterization , 2010, 2010 20th International Conference on Pattern Recognition.

[35]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[36]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[37]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[38]  K. Kroschel,et al.  Evaluation of natural emotions using self assessment manikins , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[39]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[40]  Arun Ross,et al.  On co-training online biometric classifiers , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[41]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[42]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[43]  Pierre Dumouchel,et al.  Anchor Models for Emotion Recognition from Speech , 2013, IEEE Transactions on Affective Computing.

[44]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[45]  Björn W. Schuller,et al.  The Computational Paralinguistics Challenge [Social Sciences] , 2012, IEEE Signal Processing Magazine.

[46]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[47]  Björn Schuller,et al.  Computational Paralinguistics , 2013 .

[48]  Jingbo Zhu,et al.  Active Learning With Sampling by Uncertainty and Density for Data Annotations , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[49]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[50]  Alicia Fornés,et al.  Co-training for Handwritten Word Recognition , 2011, 2011 International Conference on Document Analysis and Recognition.