Distilling Knowledge from an Ensemble of Models for Punctuation Prediction

This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F1-score. The DNN student model also achieves up to 13.4% absolute overall F1-score improvement over the conventionally-trained baseline models.

[1]  V. Silber-Varod,et al.  The effect of pitch, intensity and pause duration in punctuation detection , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[2]  Michiel Bacchiani,et al.  Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Christoph Meinel,et al.  Punctuation Prediction for Unsegmented Transcript Based on Word Vector , 2016, LREC.

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[6]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[7]  Heidi Christensen,et al.  Punctuation annotation using statistical prosody models. , 2001 .

[8]  Nicola Ueffing,et al.  Improved models for automatic punctuation prediction for spoken and written text , 2013, INTERSPEECH.

[9]  William Chan,et al.  Transferring knowledge from a RNN to a DNN , 2015, INTERSPEECH.

[10]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[11]  Christoph Meinel,et al.  Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models , 2016, INTERSPEECH.

[12]  Yevgen Chebotar,et al.  Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition , 2016, INTERSPEECH.

[13]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[14]  Helena Moniz,et al.  Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[16]  James R. Glass,et al.  Sentence Detection Using Multiple Annotations , 2012, INTERSPEECH.

[17]  Lori Lamel,et al.  Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text , 2012, INTERSPEECH.

[18]  Tanel Alumäe,et al.  Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration , 2016, INTERSPEECH.

[19]  Ryo Masumura,et al.  Domain adaptation of DNN acoustic models using knowledge distillation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Hwee Tou Ng,et al.  Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction , 2012, INTERSPEECH.

[21]  Yifan Gong,et al.  Learning small-size DNN with output-distribution-based criteria , 2014, INTERSPEECH.

[22]  Jan Niehues,et al.  Combination of NN and CRF models for joint detection of punctuation and disfluencies , 2015, INTERSPEECH.

[23]  Li Deng,et al.  Ensemble deep learning for speech recognition , 2014, INTERSPEECH.

[24]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[25]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[26]  Jian Xue,et al.  Ensemble Learning Approaches in Speech Recognition , 2015 .

[27]  Peter Bell,et al.  Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[29]  Ji-Hwan Kim,et al.  The use of prosody in a combined system for punctuation generation and speech recognition , 2001, INTERSPEECH.