System request detection in conversation based on acoustic and speaker alternation features

For a hands-free speech interface, it is important to detect commands in spontaneous utterances. To discriminate commands from human-human conversations by acoustic features, it is efficient to consider the head and the tail of an utterance. The different characteristics of system requests and spontaneous utterances appear on these parts of an utterance. Experiment shows that by separating the head and the tail of an utterance, the accuracy of detection was improved. And also, considering the alternation of speakers using two channel microphones improved the performance. Although detecting system requests using linguistic features shows high accuracy, combining acoustic and turn-taking features lift up the performance. Index Terms: system request detection, utterance verification, SVM, speech recognition, turn-taking

[1]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[2]  Sadaoki Furui,et al.  Science and Technology Agency Priority Program : Spontaneous speech : Corpus and processing technology , 2000 .

[3]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[4]  Chin-Hui Lee,et al.  Speaking-style dependent lexicalized filler model for key-phrase detection and verification , 1997, ICSLP.

[5]  Yasuo Horiuchi,et al.  Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue , 2005, INTERSPEECH.

[6]  Tetsunori Kobayashi,et al.  Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations , 2004, INTERSPEECH.

[7]  Thorsten Joachims,et al.  SVM Light: Support Vector Machine , 2002 .

[8]  Maurizio Omologo,et al.  Acoustic source location in noisy and reverberant environment using CSP analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Kenji Araki,et al.  Linguistic and acoustic features depending on different situations - the experiments considering speech recognition rate , 2005, INTERSPEECH.

[10]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[11]  Herbert Gish,et al.  Spotting events in continuous speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Peter Beyerlein,et al.  Speaker adaptation in the Philips system for large vocabulary continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.