Prosody-based sentence boundary detection in Chinese broadcast news

In this paper, we explore the use of prosodic features in sentence boundary detection in Chinese broadcast news. The prosodic features include speaker turn, music, pause duration, pitch, energy and speaking rate. Specifically, considering the Chinese tonal effects in pitch trajectory, we propose to use tone-normalized pitch features. Experiments using decision trees demonstrate that the tone-normalized pitch features show superior performance in sentence boundary detection in Chinese broadcast news. Furthermore, feature combination is able to achieve apparent performance improvement by intuitive feature interactive rules formed in the decision tree. Pause duration and a tone-normalized pitch feature contribute the most part of the feature usage in the best-performing decision tree.

[1]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Gökhan Tür,et al.  Co-training using prosodic and lexical information for sentence segmentation , 2007, INTERSPEECH.

[3]  Mary P. Harper,et al.  Reranking for Sentence Boundary Detection in Conversational Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Elizabeth Shriberg,et al.  Using Prosody for Automatic Sentence Segmentation of Multi-party Meetings , 2006, TSD.

[5]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[6]  Sadaoki Furui,et al.  Automatic Sentence Segmentation of Speech for Automatic Summarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Chiu-yu Tseng,et al.  Fluent speech prosody: Framework and modeling , 2005, Speech Commun..

[8]  Yong Luo,et al.  Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news , 2011, Multimedia Systems.

[9]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[10]  Chuan Liu,et al.  Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News , 2007, NAACL.

[11]  Lei Xie,et al.  A Two-Stage Multi-Feature Integration Approach to Unsupervised Speaker Change Detection in Real-Time News Broadcasting , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.

[12]  Dilek Z. Hakkani-Tür,et al.  Speech segmentation and spoken document processing , 2008, IEEE Signal Processing Magazine.

[13]  Lei Xie,et al.  Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news , 2008, Multimedia Systems.

[14]  Hermann Ney,et al.  Sentence segmentation using IBM word alignment model 1 , 2005, EAMT.

[15]  Dilek Z. Hakkani-Tür,et al.  The ICSI+ multilingual sentence segmentation system , 2006, INTERSPEECH.