暂无分享,去创建一个
Daniel P. W. Ellis | Kevin W. Wilson | Andrew C. Gallagher | Sourish Chaudhuri | Caroline Pantofaru | Zhonghua Xi | Joseph Roth | Loretta Guarino Reid | Nathan Reale | Radhika Marvin | Liat Kaver | D. Ellis | C. Pantofaru | K. Wilson | S. Chaudhuri | Joseph Roth | Radhika Marvin | Liat Kaver | Zhonghua Xi | Nathan Reale
[1] Olivier Galibert,et al. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.
[2] Mark Liberman,et al. Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.
[3] Jean Carletta,et al. The AMI meeting corpus , 2005 .
[4] Gerhard Widmer,et al. Improving voice activity detection in movies , 2015, INTERSPEECH.
[5] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Sridha Sridharan,et al. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms , 2010, INTERSPEECH.
[7] Inseon Jang,et al. Enhanced Feature Extraction for Speech Detection in Media Audio , 2017, INTERSPEECH.
[8] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[9] Spyridon Matsoukas,et al. Developing a Speech Activity Detection System for the DARPA RATS Program , 2012, INTERSPEECH.
[10] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[11] Olivier Galibert,et al. The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.
[12] Israel Cohen,et al. Adaptive weighting parameter in audio-visual voice activity detection , 2016, 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE).
[13] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[14] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Israel Cohen,et al. Audio-Visual Voice Activity Detection Using Diffusion Maps , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Rathinavelu Chengalvarayan,et al. Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition , 1999, EUROSPEECH.
[20] Sridha Sridharan,et al. Complete-linkage clustering for voice activity detection in audio and visual speech , 2015, INTERSPEECH.
[21] Carlos Busso,et al. Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection , 2017, INTERSPEECH.
[22] Guillaume Gravier,et al. The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.
[23] Malcolm Slaney,et al. Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers , 2017, ArXiv.
[24] Joon-Hyuk Chang,et al. Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..
[25] Xavier Anguera Miró,et al. Robust speaker diarization for meetings: ICSI RT06s evaluation system , 2006, INTERSPEECH.
[26] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[27] John H. L. Hansen,et al. Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux , 2013, IEEE Signal Processing Letters.
[28] Aristodemos Pnevmatikakis,et al. Voice activity detection using audio-visual information , 2009, 2009 16th International Conference on Digital Signal Processing.
[29] Henrik Schulz,et al. Speaker diarization of broadcast news in Albayzin 2010 evaluation campaign , 2012, EURASIP J. Audio Speech Music. Process..
[30] Roland Maas,et al. Domain-Specific Utterance End-Point Detection for Speech Recognition , 2017, INTERSPEECH.
[31] Won-Ho Shin,et al. Speech/non-speech classification using multiple features for robust endpoint detection , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[32] Chungyong Lee,et al. Robust voice activity detection algorithm for estimating noise spectrum , 2000 .