A Deep Neural Network approach for Voice Activity Detection in multi-room domestic scenarios
暂无分享,去创建一个
Francesco Piazza | Stefano Squartini | Emanuele Principi | Roberto Bonfigli | Giacomo Ferroni | S. Squartini | E. Principi | F. Piazza | Roberto Bonfigli | Giacomo Ferroni
[1] Ji Wu,et al. An efficient voice activity detection algorithm by combining statistical model and energy detection , 2011, EURASIP J. Adv. Signal Process..
[2] B. Kollmeier,et al. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.
[3] Martin Wolf,et al. Channel selection measures for multi-microphone speech recognition , 2014, Speech Commun..
[4] E. Shlomot,et al. ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..
[5] Joon-Hyuk Chang,et al. Voice activity detection based on generalized gamma distribution , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[6] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[7] Javier Ramírez,et al. Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..
[8] Erik Marchi,et al. Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Li Deng,et al. A tutorial survey of architectures, algorithms, and applications for deep learning , 2014, APSIPA Transactions on Signal and Information Processing.
[10] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[11] Petros Maragos,et al. The DIRHA simulated corpus , 2014, LREC.
[12] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.
[13] DeLiang Wang,et al. Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection , 2014, INTERSPEECH.
[14] Francesco Piazza,et al. A distributed system for recognizing home automation commands and distress calls in the Italian language , 2013, INTERSPEECH.
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Petros Maragos,et al. The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).
[17] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[18] Alessio Brutti,et al. A speech event detection and localization task for multiroom environments , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).
[19] D. J. Hermes,et al. Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.
[20] Thad Hughes,et al. Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[22] S. Squartini,et al. Neural Networks Based Methods for Voice Activity Detection in a Multi-room Domestic Environment , 2014 .
[23] Chungyong Lee,et al. Robust voice activity detection algorithm for estimating noise spectrum , 2000 .
[24] Yuuki Tachioka,et al. Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).
[25] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.
[26] Eduard A. Jorswieck,et al. Sum Rate Optimization by Spatial Precoding for a Multiuser MIMO DFT-Precoded OFDM Uplink , 2011, EURASIP J. Adv. Signal Process..
[27] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[28] Joon-Hyuk Chang,et al. Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..
[29] I. Cohen,et al. AR-GARCH in Presence of Noise: Parameter Estimation and Its Application to Voice Activity Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[31] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[32] Sanjit K. Mitra,et al. Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.
[33] Javier Ramírez,et al. Statistical voice activity detection using a multiple observation likelihood ratio test , 2005, IEEE Signal Processing Letters.
[34] Richard M. Stern,et al. Robust speech recognition using temporal masking and thresholding algorithm , 2014, INTERSPEECH.