HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors
暂无分享,去创建一个
Tomi Kinnunen | Zheng-Hua Tan | Elie el Khoury | Md. Sahidullah | Alexey Sholokhov | Dennis Alexander Lehmann Thomsen | T. Kinnunen | Z. Tan | E. Khoury | Md. Sahidullah | A. Sholokhov
[1] R. Tucker,et al. Voice activity detection using a periodicity measure , 1992 .
[2] Joon-Hyuk Chang,et al. Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..
[3] Jean-Luc Gauvain,et al. Improving Speaker Diarization , 2004 .
[4] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..
[5] Jianwu Dang,et al. Voice Activity Detection Based on an Unsupervised Learning Framework , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Roeland Ordelman,et al. Filtering the unknown: speech activity detection in heterogeneous video collections , 2007, INTERSPEECH.
[7] Themos Stafylakis,et al. Supervised/Unsupervised Voice Activity Detectors for Text-dependent Speaker Recognition on the RSR2015 Corpus , 2014, Odyssey.
[8] Ponani S. Gopalakrishnan,et al. Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[9] Pascal Druyts,et al. Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..
[10] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..
[11] Ji Wu,et al. Transfer Learning for Voice Activity Detection: A Denoising Deep Neural Network Perspective , 2013, ArXiv.
[12] Zheng-Hua Tan,et al. Low-Complexity Variable Frame Rate Analysis for Speech Recognition and Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.
[13] Elie el Khoury,et al. Improved speaker diarization system for meetings , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Thad Hughes,et al. Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Thomas P. Minka,et al. Algorithms for maximum-likelihood logistic regression , 2003 .
[16] Yun Lei,et al. A noise-robust system for NIST 2012 speaker recognition evaluation , 2013, INTERSPEECH.
[17] Juan Manuel Górriz,et al. Voice Activity Detection. Fundamentals and Speech Recognition System Robustness , 2007 .
[18] A. Savitzky,et al. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .
[19] John H. L. Hansen,et al. I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification , 2013, INTERSPEECH.
[20] Tomi Kinnunen,et al. A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Douglas A. Reynolds,et al. A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..
[22] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Brian Kingsbury,et al. Improvements to the IBM speech activity detection system for the DARPA RATS program , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.
[25] Yun Lei,et al. Softsad: Integrated frame-based speech confidence for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Elie Khoury,et al. I-Vectors for speech activity detection , 2016, Odyssey.
[27] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .
[28] Mark Liberman,et al. Speech activity detection on youtube using deep neural networks , 2013, INTERSPEECH.
[29] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[30] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..
[31] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.
[32] David A. van Leeuwen,et al. Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[33] Hynek Hermansky,et al. Developing a speaker identification system for the DARPA RATS project , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[34] Ahmet M. Kondoz,et al. Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .
[35] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[36] KimChanwoo,et al. Power-normalized cepstral coefficients (PNCC) for robust speech recognition , 2016 .
[37] Joon-Hyuk Chang,et al. Voice activity detection based on a family of parametric distributions , 2007, Pattern Recognit. Lett..
[38] Thomas Hain,et al. Segmentation and classification of broadcast news audio , 1998, ICSLP.
[39] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[40] David A. van Leeuwen,et al. Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Herbert Gish,et al. Segregation of speakers for speech recognition and speaker identification , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[42] John H. L. Hansen,et al. Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux , 2013, IEEE Signal Processing Letters.