Incorporating External POS Tagger for Punctuation Restoration

Punctuation restoration is an important post-processing step in automatic speech recognition. Among other kinds of external information, part-of-speech (POS) taggers provide informative tags, suggesting each input token’s syntactic role, which has been shown to be beneficial for the punctuation restoration task. In this work, we incorporate an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information. Besides, we propose sequence boundary sampling (SBS) to learn punctuation positions more efficiently as a sequence tagging task. Experimental results show that our methods can consistently obtain performance gains and achieve a new state-of-the-art on the common IWSLT benchmark. Further ablation studies illustrate that both large pre-trained language models and the external POS tagger take essential parts to improve the model’s performance.

[1]  Máté Ákos Tündik,et al.  Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach , 2019, INTERSPEECH.

[2]  Shachar Mirkin,et al.  Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks , 2017, INTERSPEECH.

[3]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[4]  Hwee Tou Ng,et al.  Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction , 2012, INTERSPEECH.

[5]  Firoj Alam,et al.  Punctuation Restoration using Transformer Models for High-and Low-Resource Languages , 2020, W-NUT@EMNLP.

[6]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[7]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[8]  Chng Eng Siong,et al.  Transfer Learning for Punctuation Prediction , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[9]  Jianhua Tao,et al.  Focal Loss for Punctuation Prediction , 2020, INTERSPEECH.

[10]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[11]  Sebastian Stüker,et al.  Overview of the IWSLT 2012 evaluation campaign , 2012, IWSLT.

[12]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Jianhua Tao,et al.  Adversarial Transfer Learning for Punctuation Restoration , 2020, ArXiv.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[17]  Guokun Lai,et al.  Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.

[18]  Gayle McElvain,et al.  Efficient Automatic Punctuation Restoration Using Bidirectional Transformers with Robust Inference , 2020, IWSLT.

[19]  Jan Niehues,et al.  Combination of NN and CRF models for joint detection of punctuation and disfluencies , 2015, INTERSPEECH.

[20]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21]  Jianhua Tao,et al.  Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Ya Li,et al.  Distilling Knowledge from an Ensemble of Models for Punctuation Prediction , 2017, INTERSPEECH.

[23]  Tanel Alumäe,et al.  Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.

[24]  Shuangzhi Wu,et al.  Punctuation Prediction with Transition-based Parsing , 2013, ACL.

[25]  Seokhwan Kim,et al.  Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[27]  Qian Chen,et al.  Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  V. Silber-Varod,et al.  The effect of pitch, intensity and pause duration in punctuation detection , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[31]  Srikanth Ronanki,et al.  Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech , 2020, INTERSPEECH.

[32]  Zhen Yang,et al.  Self-Attention Based Network for Punctuation Restoration , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[33]  Christoph Meinel,et al.  Punctuation Prediction for Unsegmented Transcript Based on Word Vector , 2016, LREC.