Novel acoustic features for automatic dialog-act tagging

This paper presents 57 new acoustic features for automatic dialog-act tagging. The features are intended to be richer than and complementary to the traditional cumulative statistics of intonation. Some of our novel contributions include feature normalization with respect to neighboring utterances, incorporation of periodicity and formant features, modeling of cognitive phenomena such as hesitations, and utterance-level aggregation of short-term acoustic effects. The proposed features are applied to 3-way dialog-act tagging and question detection using two databases (British-English call-center conversations and Switchboard), and compared with a popular cumulative-statistics baseline using logistic-regression models. Our features are found to be significantly better than and complementary to the baseline, on average, achieving an absolute performance gain of ~5-6%. Combined feature ranking reveals that about 75% of the top 20 features belong to the proposed feature set, and that the two corpora differ in their feature preferences despite similar overall performance.

[1]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[2]  Abeer Alwan,et al.  Entropy-based variable frame rate analysis of speech signals and its application to ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[4]  Lori Lamel,et al.  Automatic detection of dialog acts based on multilevel information , 2004, INTERSPEECH.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shrikanth S. Narayanan,et al.  Automatic classification of question turns in spontaneous speech using lexical and prosodic evidence , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Yannis Stylianou,et al.  Assessing the intelligibility impact of vowel space expansion via clear speech-inspired frequency warping , 2013, INTERSPEECH.

[8]  Rosalind W. Picard,et al.  Dialog Act Classification from Prosodic Features Using Support Vector Machines , 2002 .

[9]  Shrikanth S. Narayanan,et al.  Combining lexical, syntactic and prosodic cues for improved online dialog act tagging , 2009, Comput. Speech Lang..

[10]  Dilek Z. Hakkani-Tür,et al.  Any questions? Automatic question detection in meetings , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[12]  Julia Hirschberg,et al.  Detecting question-bearing turns in spoken tutorial dialogues , 2006, INTERSPEECH.

[13]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[14]  Laurent Besacier,et al.  Automatic question detection: prosodic-lexical features and crosslingual experiments , 2007, INTERSPEECH.

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Yorick Wilks,et al.  Dialogue Act Classification Based on Intra-Utterance Features∗ , 2005 .

[17]  Gina-Anne Levow,et al.  Dialog act tagging with support vector machines and hidden Markov models , 2006, INTERSPEECH.

[18]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.