Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection
暂无分享,去创建一个
[1] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[2] William Chan,et al. Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments , 2016, INTERSPEECH.
[3] Carlos Segura,et al. Overlap detection for speaker diarization by fusing spectral and spatial features , 2010, INTERSPEECH.
[4] Petros Maragos,et al. AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..
[5] John H. L. Hansen,et al. Methods for stress classification: nonlinear TEO and linear speech based features , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[6] John H. L. Hansen,et al. Robust overlapped speech detection and its application in word-count estimation for Prof-Life-Log data , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Jun Du,et al. Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] M. A. Bee,et al. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? , 2008, Journal of comparative psychology.
[9] John R. Hershey,et al. Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..
[10] Petros Maragos,et al. Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..
[11] Mohammad Hassan Savoji,et al. Supervised speech enhancement using online Group-Sparse Convolutive NMF , 2016, 2016 8th International Symposium on Telecommunications (IST).
[12] Jean Carletta,et al. The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.
[13] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[14] Björn W. Schuller,et al. Detecting overlapping speech with long short-term memory recurrent neural networks , 2013, INTERSPEECH.
[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[16] D. Oberfeld,et al. Individual differences in selective attention predict speech identification at a cocktail party , 2016, eLife.
[17] A. Treisman. Contextual Cues in Selective Listening , 1960 .
[18] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[19] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.
[20] John H. L. Hansen,et al. Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..
[21] John H.L. Hansen,et al. Frame-Based Overlapping Speech Detection Using Convolutional Neural Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Valentin Andrei,et al. Detecting Overlapped Speech on Short Timeframes Using Deep Learning , 2017, INTERSPEECH.
[23] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Tomohiro Nakatani,et al. All-neural Online Source Separation, Counting, and Diarization for Meeting Analysis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] J. Deutsch,et al. Attention: Some theoretical considerations. , 1963 .
[27] John H. L. Hansen,et al. Overlapped-speech detection with applications to driver assessment for in-vehicle active safety systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Soheil Khorram,et al. Probabilistic Permutation Invariant Training for Speech Separation , 2019, INTERSPEECH.
[29] Gerald Friedland,et al. Improved Overlapped Speech Handling for Speaker Diarization , 2011, INTERSPEECH.
[30] Gerald Friedland,et al. Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[31] John H. L. Hansen,et al. Classification of speech under stress based on features derived from the nonlinear Teager energy operator , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[32] John H. L. Hansen,et al. Teager–Kaiser Energy Operators for Overlapped Speech Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[34] John H. L. Hansen,et al. Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates , 2018, INTERSPEECH.
[35] Dong Yu,et al. Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.