Does affect affect automatic recognition of children2s speech?

The automatic recognition of children’s speech is well known to be a challenge, and so is the inuence of aect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of these phenomena: extensive test-runs are carried out for 1k vocabulary continuous speech recognition on spontaneous angry, motherese and emphatic children’s speech as opposed to neutral speech. The experiments mainly address the questions how specic emotions inuence word accuracy, and whether neutral speech material is sucient for training as opposed to matched conditions acoustic model adaptation. In the result emphatic and angry speech are best recognised, while neutral speech proves a good choice for training. For the discussion of this eect we further visualise emotion distribution in the MFCC space by Sammon transformation.

[1]  Ronald A. Cole,et al.  Highly accurate children's speech recognition for interactive reading tutors using subword units , 2007, Speech Commun..

[2]  Elmar Nöth,et al.  Visualization of Voice Disorders Using the Sammon Transform , 2006, TSD.

[3]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[4]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[5]  Elmar Nöth,et al.  "Of all things the measure is man" automatic classification of emotions and inter-labeler consistency [speech-based emotion recognition] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Mats Blomberg Collection and recognition of children s speech in the PF-Star project , 2003 .

[7]  Kornel Laskowski,et al.  Combining Efforts for Improving Automatic Classification of Emotional User States , 2006 .

[8]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[9]  Elmar Nöth,et al.  Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech , 2008, User Modeling and User-Adapted Interaction.

[10]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[11]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Björn Schuller,et al.  Affect-Robust Speech Recognition by Dynamic Emotional Adaptation , 2006 .

[13]  Martin J. Russell,et al.  Recognition of read and spontaneous children's speech using two new corpora , 2004, INTERSPEECH.

[14]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.