Active Learning for Speech Event Detection in HCI

In this work, a pool-based active learning approach combining outlier detection methods with uncertainty sampling is proposed for speech event detection. Events in this case are regarded as atypical utterances (e.g. laughter, heavy breathing) occurring sporadically during a Human Computer Interaction (HCI) scenario. The proposed approach consists in using rank aggregation to select informative speech segments which have previously been ranked using different outlier detection techniques combined with an uncertainty sampling technique. The uncertainty sampling method is based on the distance to the boundary of a Support Vector Machine with Radial Basis Function kernel trained on the available annotated samples. Extensive experimental results prove the effectiveness of the proposed approach.

[1]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[2]  Themos Stafylakis,et al.  Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.

[3]  Shili Lin,et al.  Rank aggregation methods , 2010 .

[4]  Shashidhar G. Koolagudi,et al.  Emotion Recognition Using Vocal Tract Information , 2013 .

[5]  J. Russell Core affect and the psychological construction of emotion. , 2003, Psychological review.

[6]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[8]  N. Ramesh Babu,et al.  Speech recognition using MFCC and DTW , 2014, 2014 International Conference on Advances in Electrical Engineering (ICAEE).

[9]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[10]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[11]  Detection of Emotional Events utilizing Support Vector Methods in an Active Learning HCI Scenario , 2014, ERM4HCI '14.

[12]  Shashidhar G. Koolagudi,et al.  Speech Emotion Recognition: A Review , 2013 .

[13]  Patrick Thiam,et al.  Ensembles of Support Vector Data Description for Active Learning Based Annotation of Affective Corpora , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[14]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[15]  Frank Honold,et al.  In-Depth Analysis of Multimodal Interaction: An Explorative Paradigm , 2016, HCI.

[16]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..