Sub-lexical Dialogue Act Classification in a Spoken Dialogue System Support for the Elderly with Cognitive Disabilities

This paper presents a dialogue act classification for a spoken dialogue system that delivers necessary information to elderly subjects with mild dementia. Lexical features have been shown to be effective for classification, but the automatic transcription of spontaneous speech demands expensive language modeling. Therefore, this paper proposes a classifier that does not require language modeling and that uses sub-lexical features instead of lexical features. This classifier operates on sequences of phonemes obtained by a phoneme recognizer and exhaustively analyzes the saliency of all possible sub-sequences using a support vector machine with a string kernel. An empirical study of a dialogue corpus containing elderly speech showed that the sub-lexical classifier was robust against the poor modeling of language and it performed better than a lexical classifier that used hidden Markov models of words. Index Terms: dialogue acts, support vector machines, string kernels, spontaneous speech, elderly speech, dementia

[1]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[2]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[3]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[4]  Gina-Anne Levow,et al.  Dialog act tagging with support vector machines and hidden Markov models , 2006, INTERSPEECH.

[5]  Nigel G. Ward,et al.  Towards Empirical Dialog-State Modeling and its Use in Language Modeling , 2012, INTERSPEECH.

[6]  Zuhair Bandar,et al.  A Multi-classifier Approach to Dialogue Act Classification Using Function Words , 2012, Trans. Comput. Collect. Intell..

[7]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Tatsuya Kawahara,et al.  Speaking-rate dependent decoding and adaptation for spontaneous lecture speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Kiyohiro Shikano,et al.  Continuous Speech Recognition Consortium an Open Repository for CSR Tools and Models , 2002, LREC.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Misato Nihei,et al.  Field-based development of an information support robot for persons with dementia , 2012 .

[13]  Sadaoki Furui,et al.  SPONTANEOUS SPEECH RECOGNITION AND SUMMARIZATION , 2005 .

[14]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[15]  Fredrik Olsson,et al.  Active Learning for Dialogue Act Classification , 2011, INTERSPEECH.

[16]  Shrikanth S. Narayanan,et al.  Combining lexical, syntactic and prosodic cues for improved online dialog act tagging , 2009, Comput. Speech Lang..

[17]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[18]  Ken Sadohara Kernel topic segmentation for informal multi-party meetings and performance degradation caused by insufficient lexicon , 2010, 2010 IEEE Spoken Language Technology Workshop.

[19]  Daniel Jurafsky,et al.  Lexical, Prosodic, and Syntactic Cues for Dialog Acts , 1998 .

[20]  Giuseppe Riccardi,et al.  Simultaneous dialog act segmentation and classification from human-human spoken conversations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Tatsuya Kawahara,et al.  UNSUPERVISED LANGUAGE MODEL ADAPTATION FOR LECTURE SPEECH RECOGNITION , 2003 .

[22]  Kiyohiro Shikano,et al.  Elderly acoustic model for large vocabulary continuous speech recognition , 2001, INTERSPEECH.

[23]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[24]  Tanja Schultz,et al.  Automatic disfluency removal on recognized spontaneous speech - rapid adaptation to speaker-dependent disfluencies , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[26]  A. Stolcke,et al.  Automatic detection of discourse structure for speech recognition and understanding , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[27]  Ken Samuel,et al.  Dialogue Act Tagging with Transformation-Based Learning , 1998, ACL.

[28]  Tatsuya Kawahara,et al.  Language model and speaking rate adaptation for spontaneous presentation speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[29]  Tatsuya Kawahara,et al.  Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[32]  Wendy L. Bedwell,et al.  Cognitive Aids , 2014 .