Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.

[1]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[2]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[3]  Leo Wanner,et al.  A Neural Network Architecture for Multilingual Punctuation Generation , 2016, EMNLP.

[4]  Jan Niehues,et al.  Insertion for Real-time Spoken Language Translation , 2015 .

[5]  Kyunghyun Cho,et al.  Larger-Context Language Modelling , 2015, ArXiv.

[6]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[7]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[8]  Heidi Christensen,et al.  Punctuation annotation using statistical prosody models. , 2001 .

[9]  Helena Moniz,et al.  Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Dilek Z. Hakkani-Tür,et al.  IMPACT OF AUTOMATIC COMMA PREDICTION ON POS/NAME TAGGING OF SPEECH , 2006, 2006 IEEE Spoken Language Technology Workshop.

[11]  Johanna D. Moore,et al.  Paragraph-based prosodic cues for speech synthesis applications , 2016 .

[12]  Dilek Z. Hakkani-Tür,et al.  Cross-linguistic analysis of prosodic features for sentence segmentation , 2007, INTERSPEECH.

[13]  João Paulo da Silva Neto,et al.  Improved punctuation recovery through combination of multiple speech streams , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Ji Wu,et al.  Automatic punctuation generation for speech , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  Markus Freitag,et al.  Modeling punctuation prediction as machine translation , 2011, IWSLT.

[18]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[19]  Josef Psutka,et al.  Automatic punctuation annotation in czech broadcast news speech , 2004 .

[20]  Irina Chernykh,et al.  Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System , 2015, SPECOM.

[21]  V. Silber-Varod,et al.  The effect of pitch, intensity and pause duration in punctuation detection , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[22]  Jan Niehues,et al.  Segmentation and punctuation prediction in speech language translation using a monolingual translation system , 2012, IWSLT.

[23]  Andreas Stolcke,et al.  Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues , 2002, INTERSPEECH.

[24]  Dilek Z. Hakkani-Tür,et al.  Punctuating speech for information extraction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26]  Aleš Horák,et al.  Punctuation Detection with Full Syntactic Parsing , 2010 .

[27]  Nicola Ueffing,et al.  Improved models for automatic punctuation prediction for spoken and written text , 2013, INTERSPEECH.

[28]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[29]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[30]  Tanel Alumäe,et al.  Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.

[31]  Lori Lamel,et al.  Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text , 2012, INTERSPEECH.