Punctuation prediction using a bidirectional recurrent neural network with part-of-speech tagging

Most automatic speech recognition (ASR) systems are incapable of generating punctuation, making it difficult to read the transcribed output and less appropriate for tasks such as dictation. This paper introduces a procedure to automatically insert punctuation into unpunctuated sentences by using a bidirectional recurrent neural network with attention mechanism and Part-of-Speech (POS) Tags. Using the WikiText Long Term Dependency Language Modelling Dataset and handling 11 different punctuation symbols, the model managed to achieve a punctuation error rate of 31.4% and an F1 score of 78.5%. When the system was trained on consecutive sentences and a smaller dataset using the Europarl v7 corpus, the model still managed to achieve a punctuation error rate of 48.1% and an F1 score of 64.7%. In both cases, our proposed system outperforms previous state-of-the-art systems trained on the same datasets, showing the advantage of using POS tags information and an encoderdecoder network.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Kaisheng Yao,et al.  Investigating LSTM for punctuation prediction , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[3]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[4]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[5]  Askars Salimbajevs Bidirectional LSTM for Automatic Punctuation Restoration , 2016, Baltic HLT.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[8]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Markus Freitag,et al.  Modeling punctuation prediction as machine translation , 2011, IWSLT.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[14]  Heidi Christensen,et al.  Punctuation annotation using statistical prosody models. , 2001 .

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Jan Niehues,et al.  Segmentation and punctuation prediction in speech language translation using a monolingual translation system , 2012, IWSLT.

[17]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[18]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[19]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[20]  Josef Psutka,et al.  Automatic punctuation annotation in czech broadcast news speech , 2004 .

[21]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[22]  Hermann Ney,et al.  Automatic sentence segmentation and punctuation prediction for spoken language translation , 2006, IWSLT.

[23]  Tanel Alumäe,et al.  Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.