Improved DNN-based segmentation for multi-genre broadcast audio
暂无分享,去创建一个
Mark J. F. Gales | Chao Zhang | Linlin Wang | Pierre Lanchantin | Xunying Liu | Philip C. Woodland | Yanmin Qian | Panagiota Karanasou
[1] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[2] Douglas A. Reynolds,et al. An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Yoshihiko Nankaku,et al. Voice activity detection based on conditional random fields using multiple features , 2010, INTERSPEECH.
[5] Raymond W. M. Ng,et al. The 2015 sheffield system for longitudinal diarisation of broadcast media , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[6] Mark J. F. Gales,et al. The development of the cambridge university alignment systems for the multi-genre broadcast challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[7] Thad Hughes,et al. Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Joon-Hyuk Chang,et al. A statistical model-based voice activity detection using multiple DNNs and noise awareness , 2015, INTERSPEECH.
[9] Ricky Ho Yin Chan,et al. Improving broadcast news transcription by lightly supervised discriminative training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[10] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[11] Harry Wechsler,et al. Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Chao Zhang,et al. A general artificial neural network extension for HTK , 2015, INTERSPEECH.
[13] Mark J. F. Gales,et al. The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.
[14] Mark J. F. Gales,et al. Progress in the CU-HTK broadcast news transcription system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Rathinavelu Chengalvarayan,et al. Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.
[16] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.
[17] D A Reynolds,et al. The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations , 2004 .
[18] Xiao-Lei Zhang,et al. Deep Belief Networks Based Voice Activity Detection , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[19] DeLiang Wang,et al. Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection , 2014, INTERSPEECH.
[20] Alessandra Flammini,et al. Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach , 2002, EURASIP J. Adv. Signal Process..
[21] Joon-Hyuk Chang,et al. Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..
[22] Mark J. F. Gales,et al. Speaker diarisation and longitudinal linking in multi-genre broadcast data , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[23] Mickael Rouvier,et al. An open-source state-of-the-art toolbox for broadcast news diarization , 2013, INTERSPEECH.
[24] Mark J. F. Gales,et al. Cambridge university transcription systems for the multi-genre broadcast challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[25] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[26] Jun Du,et al. A universal VAD based on jointly trained deep neural networks , 2015, INTERSPEECH.
[27] Jean-Luc Gauvain,et al. Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.
[28] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[29] Mark Liberman,et al. Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.
[30] Barbara Peskin,et al. TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .
[31] Steve Young,et al. Segment generation and clustering in the HTK broadcast news transcription system , 1998 .
[32] Nima Mesgarani,et al. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[33] Mohammad Hossein Moattar,et al. A review on speaker diarization systems and approaches , 2012, Speech Commun..