Permutation invariant training of deep models for speaker-independent multi-talker speech separation
暂无分享,去创建一个
Jesper Jensen | Zheng-Hua Tan | Morten Kolbæk | Dong Yu | Dong Yu | Z. Tan | Morten Kolbæk | J. Jensen
[1] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[2] CookeMartin,et al. Monaural speech separation and recognition challenge , 2010 .
[3] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Enhong Chen,et al. Learning Deep Representations for Graph Clustering , 2014, AAAI.
[5] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[6] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[8] John R. Hershey,et al. Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..
[9] Dong Yu,et al. Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition , 2010 .
[10] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[11] Tuomas Virtanen,et al. Speech recognition using factorial hidden Markov models for separation in the feature space , 2006, INTERSPEECH.
[12] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[13] John R. Hershey,et al. Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.
[14] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.
[15] Le Roux. Sparse NMF – half-baked or well done? , 2015 .
[16] Daniel Patrick Whittlesey Ellis,et al. Prediction-driven computational auditory scene analysis , 1996 .
[17] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] M. Wertheimer. Laws of organization in perceptual forms. , 1938 .
[19] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.
[20] Daniel P. W. Ellis,et al. Monaural Speech Separation using Source-Adapted Models , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[21] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[22] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[23] I. Nelken. Demonstrations of Auditory Scene Analysis: The Perceptual Organization of Sound by Albert S. Bregman and Pierre A. Ahad, MIT Press, 1996. £15.95 CD , 1997, Trends in Neurosciences.
[24] Dong Yu,et al. Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Martin Cooke,et al. Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.
[26] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[27] Paris Smaragdis,et al. Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[28] George Saon,et al. The IBM 2016 English Conversational Telephone Speech Recognition System , 2016, INTERSPEECH.
[29] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Yoshitaka Nakajima,et al. Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .
[31] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[32] Wei Wang,et al. Deep Embedding Network for Clustering , 2014, 2014 22nd International Conference on Pattern Recognition.
[33] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.