An initial investigation of long-term adaptation for meeting transcription

Xie Chen would like to thank Toshiba Research Europe Ltd, Cambridge Research Lab, for funding his work. The authors would like to thank the Toshiba Cambridge Speech Group for allowing the data to be collected, also would like to thank Chao Zhang and Eric Wang for providing DNN and CMLLR transform tools.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Lukás Burget,et al.  Transcribing Meetings With the AMIDA Systems , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Alex Waibel,et al.  New developments in automatic meeting transcription , 2000, INTERSPEECH.

[4]  Martial Michel,et al.  The NIST Meeting Room Pilot Corpus , 2004, LREC.

[5]  Mark J. F. Gales,et al.  The efficient incorporation of MLP features into automatic speech recognition systems , 2011, Comput. Speech Lang..

[6]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  Yongqiang Wang,et al.  Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.

[10]  Dong Yu,et al.  Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Mark J. F. Gales,et al.  Unsupervised training and directed manual transcription for LVCSR , 2010, Speech Commun..

[12]  Mark J. F. Gales,et al.  Improved cross-task recognition using MMIE training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Jonathan G. Fiscus,et al.  The Rich Transcription 2007 Meeting Recognition Evaluation , 2007, CLEAR.

[15]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[17]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[18]  Yongqiang Wang,et al.  Efficient lattice rescoring using recurrent neural network language models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Hermann Ney,et al.  Improved MLLR speaker adaptation using confidence measures for conversational speech recognition , 2000, INTERSPEECH.

[20]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[21]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[22]  Thomas Hain,et al.  Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[23]  Mark J. F. Gales,et al.  MMI-MAP and MPE-MAP for acoustic model adaptation , 2003, INTERSPEECH.

[24]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[25]  Mark J. F. Gales,et al.  Improved neural network based language modelling and adaptation , 2010, INTERSPEECH.

[26]  Mark J. F. Gales,et al.  Porting: SwitchBoard to the VoiceMail task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[28]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[29]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[30]  Mark J. F. Gales,et al.  Integrated Online Speaker Clustering and Adaptation , 2011, INTERSPEECH.

[31]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, ICASSP.

[32]  Lukás Burget,et al.  Recurrent Neural Network Based Language Modeling in Meeting Recognition , 2011, INTERSPEECH.

[33]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.