Soft linear discriminant analysis (SLDA) for pattern recognition with ambiguous reference labels: Application to social signal processing

While most pattern recognition approaches are designed and trained with clearly defined reference labels, there are a few new applications working with ambiguous ones. Since the linear discriminant analysis (LDA) is one of the most utilized methods in pattern recognition to reduce the dimensionality of feature vectors, typically increasing the robustness of the features, we propose in this work a modification of the LDA in order to be able to handle ambiguous reference labels in a soft-decision way. In the field of social signal processing (here: emotion recognition) we demonstrate that using a soft accuracy measure evaluating the classifier's confidence output by means of a soft-labeled emotional speech database really provides a degree of similarity to (naturally ambiguous) human votes. The adaptation of our classifier to such soft accuracy measure takes place by a retraining w.r.t. the human vote distribution. Applying this soft accuracy measure to emotion recognition with ambiguous reference labels both retraining the classifier and using the new soft LDA method leads to around 22% relative increase of accuracy.

[1]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[2]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[3]  Melvyn J. Hunt,et al.  A discriminatively derived linear transform for improved speech recognition , 1993, EUROSPEECH.

[4]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[5]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[6]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tim Fingscheidt,et al.  Reference-free SNR Measurement for Narrowband and Wideband Speech Signals in Car Noise , 2012, ITG Conference on Speech Communication.

[8]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[9]  Hermann Ney,et al.  Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[11]  Patrick Meyer,et al.  A New Evaluation Methodology for Speech Emotion Recognition With Confidence Output , 2014, ITG Symposium on Speech Communication.

[12]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[13]  Dirk Heylen,et al.  Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing , 2012, IEEE Transactions on Affective Computing.

[14]  I. Linnankoski,et al.  Expression or emotional-motivational connotations with a one-word utterance. , 1997, The Journal of the Acoustical Society of America.

[15]  Josef Kittler,et al.  Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[17]  Maciej Pakosz Attitudinal judgments in intonation: Some evidence for a theory , 1983, Journal of Psycholinguistic Research.

[18]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[19]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Olivier Siohan,et al.  On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .