论文信息 - Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction

Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction

The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (LCRF) for punctuation prediction on conversational speech texts [1]. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sentence boundary and punctuation prediction task on TDT3 English broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using LCRF or maximum entropy model (MaxEnt). We show the importance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introducing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant performance penalty. Our results show that adding prosodic and n-gram score features gives about 20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.

[1] John D. Lafferty,et al. Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2] Michiel Bacchiani,et al. Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Andreas Stolcke,et al. Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Hwee Tou Ng,et al. Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[5] Heidi Christensen,et al. Punctuation annotation using statistical prosody models. , 2001 .

[6] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[7] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[8] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9] Dilek Z. Hakkani-Tür,et al. Speech segmentation and spoken document processing , 2008, IEEE Signal Processing Magazine.

[10] Geoffrey Zweig,et al. Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[11] Gökhan Tür,et al. Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[12] Ji-Hwan Kim,et al. The use of prosody in a combined system for punctuation generation and speech recognition , 2001, INTERSPEECH.

[13] Mark Liberman,et al. Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts , 2000, LREC.