Lightly supervised alignment of subtitles on multi-genre broadcasts
暂无分享,去创建一个
Raymond W. M. Ng | Oscar Saz-Torralba | Thomas Hain | Salil Deena | Mortaza Doulaty | Madina Hasan | Rosanna Milner | Bilal Khaliq | Julia Olcoz
[1] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Mark Liberman,et al. Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.
[3] Yu Tsao,et al. Recurrent Neural Network Based Personalized Language Modeling by Social Network Crowdsourcing , 2013 .
[4] Thomas Hain,et al. Web-Based Automatic Speech Recognition Service - webASR , 2011, INTERSPEECH.
[5] Dietrich Klakow,et al. Log-linear interpolation of language models , 1998, ICSLP.
[6] Raymond W. M. Ng,et al. webASR 2 - Improved Cloud Based Speech Technology , 2016, INTERSPEECH.
[7] Mark J. F. Gales,et al. The development of the cambridge university alignment systems for the multi-genre broadcast challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[8] Pedro J. Moreno,et al. A recursive algorithm for the forced alignment of very long audio segments , 1998, ICSLP.
[9] Jithendra Vepa,et al. The segmentation of multi-channel meeting recordings for automatic speech recognition , 2006, INTERSPEECH.
[10] Panayiotis G. Georgiou,et al. SailAlign: Robust long speech-text alignment , 2011 .
[11] Henrik Schulz,et al. Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign , 2012, EURASIP J. Audio Speech Music. Process..
[12] Mark J. F. Gales,et al. The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[13] Thomas Hain,et al. Automatic speech recognition for scientific purposes - webASR , 2008, INTERSPEECH.
[14] Raymond W. M. Ng,et al. The 2015 sheffield system for longitudinal diarisation of broadcast media , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[15] Oscar Saz-Torralba,et al. Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition , 2016, INTERSPEECH.
[16] Mark J. F. Gales,et al. Lightly supervised recognition for automatic alignment of large coherent speech recordings , 2010, INTERSPEECH.
[17] Lukás Burget,et al. Transcribing Meetings With the AMIDA Systems , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Raymond W. M. Ng,et al. The 2015 sheffield system for transcription of Multi-Genre Broadcast media , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[19] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[20] Jindrich Matousek,et al. Improving automatic dubbing with subtitle timing optimisation using video cut detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Oliver Watts,et al. ALISA: An automatic lightly supervised speech segmentation and alignment tool , 2016, Comput. Speech Lang..
[22] Yu Tsao,et al. Recurrent neural network based language model personalization by social network crowdsourcing , 2013, INTERSPEECH.
[23] Keikichi Hirose,et al. WFST-Based Grapheme-to-Phoneme Conversion: Open Source tools for Alignment, Model-Building and Decoding , 2012, FSMNLP.
[24] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[25] Thomas Hain,et al. Semi-supervised DNN training in meeting recognition , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[26] Mark J. F. Gales,et al. Improving lightly supervised training for broadcast transcription , 2013, INTERSPEECH.
[27] Mark Liberman,et al. THE TDT-2 TEXT AND SPEECH CORPUS , 1999 .
[28] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..
[29] Mark J. F. Gales,et al. Recurrent neural network language model adaptation for multi-genre broadcast speech recognition , 2015, INTERSPEECH.
[30] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[31] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[32] Carlo Aliprandi,et al. Automating live and batch subtitling of multimedia contents for several European languages , 2015, Multimedia Tools and Applications.
[33] Marco Furini,et al. An automatic caption alignment mechanism for off-the-shelf speech recognition technologies , 2012, Multimedia Tools and Applications.
[34] Thomas Hain,et al. Making an Automatic Speech Recognition Service Freely Available on the Web , 2011, INTERSPEECH.
[35] Susan Fitt,et al. On generating combilex pronunciations via morphological analysis , 2010, INTERSPEECH.
[36] Yongqiang Wang,et al. Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.
[37] Guillaume Gravier,et al. Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.
[38] Oscar Saz-Torralba,et al. Error Correction in Lightly Supervised Alignment of Broadcast Subtitles , 2016, INTERSPEECH.
[39] Jesper Ø. Olsen. ICSLP'98 : Proceedings of the 5th International Conference on Spoken Language Processing, November 30-December 4, 1998, Sydney, Australia , 1998 .
[40] Luis Javier Rodríguez-Fuentes,et al. A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions , 2012, INTERSPEECH.
[41] Mark J. F. Gales,et al. Improving Lightly Supervised Training for Broadcast Transcriptions , 2013, ISCA 2013.