Focal Loss for Punctuation Prediction

Many approaches have been proposed to predict punctuation marks. Previous results demonstrate that these methods are effective. However, there still exists class imbalance problem during training. Most of the classes in the training set for punctuation prediction are non-punctuation marks. This will affect the performance of punctuation prediction tasks. Therefore, this paper uses a focal loss to alleviate this issue. The focal loss can down-weight easy examples and focus training on a sparse set of hard examples. Experiments are conducted on IWSLT2011 datasets. The results show that the punctuation predicting models trained with a focal loss obtain performance improvement over that trained with a cross entropy loss by up to 2.7% absolute overall F1-score on test set. The proposed model also outperforms previous state-of-the-art models.

[1]  John D. Lafferty,et al.  Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Nicola Ueffing,et al.  Improved models for automatic punctuation prediction for spoken and written text , 2013, INTERSPEECH.

[3]  Philip N. Garner,et al.  Empirical Evaluation and Combination of Punctuation Prediction Models Applied to Broadcast News , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Jianhua Tao,et al.  Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[6]  Peter Bell,et al.  Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Christoph Meinel,et al.  Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models , 2016, INTERSPEECH.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Ya Li,et al.  Distilling Knowledge from an Ensemble of Models for Punctuation Prediction , 2017, INTERSPEECH.

[10]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Tanel Alumäe,et al.  Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.

[12]  Michiel Bacchiani,et al.  Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Jianhua Tao,et al.  Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Christoph Meinel,et al.  Punctuation Prediction for Unsegmented Transcript Based on Word Vector , 2016, LREC.

[16]  Jan Niehues,et al.  Combination of NN and CRF models for joint detection of punctuation and disfluencies , 2015, INTERSPEECH.

[17]  Andreas Stolcke,et al.  Enriching speech recognition with automatic detection of sentence boundaries and disfluencies , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Chng Eng Siong,et al.  Transfer Learning for Punctuation Prediction , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[19]  Askars Salimbajevs,et al.  Restoring Punctuation and Capitalization Using Transformer Models , 2018, SLSP.

[20]  Thomas Hain,et al.  Noise-matched training of CRF based sentence end detection models , 2015, INTERSPEECH.

[21]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[22]  Heidi Christensen,et al.  Punctuation annotation using statistical prosody models. , 2001 .

[23]  Máté Ákos Tündik,et al.  Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach , 2019, INTERSPEECH.

[24]  Ji-Hwan Kim,et al.  A combined punctuation generation and speech recognition system and its performance enhancement using prosody , 2003, Speech Commun..

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Seokhwan Kim,et al.  Deep Recurrent Neural Networks with Layer-wise Multi-head Attentions for Punctuation Restoration , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Najim Dehak,et al.  Punctuation Prediction Model for Conversational Speech , 2018, INTERSPEECH.