Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems

This paper presents a case study of cross-lingual transfer learning applied for affective computing in the domain of spoken dialogue systems. Prosodic features of correction dialog acts are modeled on a group of languages and compared with languages excluded from the analysis. Speech from different languages was recorded in carefully staged Wizard-of-Oz experiments, however, without the possibility to ensure balanced distribution of speakers per language. In order to assess the possibility of cross-lingual transfer learning and to ensure reliable classification of corrections independently of language, we employed different machine learning approaches along with relevant acoustic-prosodic features sets. The results of the experiments with mono-lingual corpora (trained and tested on a single language) and cross-lingual (trained on several languages and tested on the rest) were analyzed and compared in the terms of accuracy and F1 score.

[1]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[2]  Ivan Kraljevski,et al.  Hyperarticulation of Corrections in Multilingual Dialogue Systems , 2017, INTERSPEECH.

[3]  Klaus R. Scherer,et al.  A psycho-ethological approach to social signal processing , 2012, Cognitive Processing.

[4]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  Björn W. Schuller,et al.  Cross-language acoustic emotion recognition: An overview and some tendencies , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[7]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[8]  Tieniu Tan,et al.  Affective Computing: A Review , 2005, ACII.

[9]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[10]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[11]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Thomas Fang Zheng,et al.  Transfer learning for speech and language processing , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[14]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  F Dejong Estienne,et al.  [Voice and emotion]. , 1991, Revue de laryngologie - otologie - rhinologie.