MENTAL STATE DETECTION OF DIALOGUE SYSTEM USERS VIA SPOKEN LANGUAGE

This paper presents an approach to simulate the mental activities of children during their interaction with computers through their spoken language. The mental activities are categorized into three states: confidence, confusion and frustration. Two knowledge sources are used in the detection. One is prosody, which indicates utterance type and user’s attitude. The other is embedded key words/phrases which help interpret the utterances. Moreover, it is found that children’s speech exhibits very different acoustic characteristics from adults. Given the uniqueness of children’s speech, this paper applies a vocal-tract-length-normalization (VTLN)-based technique to compensate for both inter-speaker variability and intraspeaker variability in children’s speech. The detected key words/phrases are then integrated with prosodic information as the cues for the MAP decision of mental states. Tests on a set of 50 utterances collected from the project experiment showed the classification accuracy was 74%.

[1]  Qiru Zhou,et al.  Robust endpoint detection and energy normalization for real-time speech and speaker recognition , 2002, IEEE Trans. Speech Audio Process..

[2]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for non-keyword rejection in subword based speech recognition , 1998 .

[4]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[6]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[7]  Chin-Hui Lee,et al.  Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Alex Waibel,et al.  Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition , 1997 .

[9]  Andreas Stolcke,et al.  Switchboard Discourse Language Modeling Project (Final Report) , 1997 .

[10]  Satoshi Kobayashi,et al.  Extraction and representation rhythmic components of spontaneous speech , 1997, EUROSPEECH.