Deep clustering: Discriminative embeddings for segmentation and separation
暂无分享,去创建一个
Zhuo Chen | Jonathan Le Roux | John R. Hershey | Shinji Watanabe | J. Hershey | Shinji Watanabe | Zhuo Chen
[1] John R. Hershey,et al. Monaural speech separation and recognition challenge , 2010, Comput. Speech Lang..
[2] Enhong Chen,et al. Learning Deep Representations for Graph Clustering , 2014, AAAI.
[3] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Yang Lu,et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.
[5] DeLiang Wang,et al. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.
[6] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.
[7] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[8] Ron J. Weiss. Underdetermined source separation using speaker subspace models , 2009 .
[9] C. Darwin. Auditory grouping , 1997, Trends in Cognitive Sciences.
[10] Jonathan Le Roux,et al. Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.
[11] Yoshitaka Nakajima,et al. Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .
[12] DeLiang Wang,et al. Exploring Monaural Features for Classification-Based Speech Segregation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[13] Martin Cooke,et al. Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.
[14] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[15] G. Kramer. Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .
[16] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[17] John R. Hershey,et al. Single-Channel Multitalker Speech Recognition , 2010, IEEE Signal Processing Magazine.
[18] M. Cugmas,et al. On comparing partitions , 2015 .
[19] DeLiang Wang,et al. An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[20] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Björn W. Schuller,et al. Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Jitendra Malik,et al. Learning affinity functions for image segmentation: combining patch-based and gradient-based approaches , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..
[23] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Marina Meila,et al. Local equivalences of distances between clusterings—a geometric perspective , 2012, Machine Learning.
[25] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Le Roux. Sparse NMF – half-baked or well done? , 2015 .
[27] Daniel Patrick Whittlesey Ellis,et al. Prediction-driven computational auditory scene analysis , 1996 .
[28] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.
[29] Daniel P. W. Ellis,et al. Estimating single-channel source separation masks: relevance vector machine classifiers vs. pitch-based masking , 2006, SAPA@INTERSPEECH.
[30] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[31] WangDeLiang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013 .
[32] Wei Wang,et al. Deep Embedding Network for Clustering , 2014, 2014 22nd International Conference on Pattern Recognition.
[33] C. K. Ogden. A Source Book Of Gestalt Psychology , 2013 .
[34] M. Wertheimer. Laws of organization in perceptual forms. , 1938 .
[35] John R. Hershey,et al. Super-human multi-talker speech recognition: A graphical modeling approach , 2010, Comput. Speech Lang..
[36] Ming-Yu Liu,et al. Recursive Context Propagation Network for Semantic Scene Labeling , 2014, NIPS.
[37] Marina Meil,et al. The stability of a good clustering , 2011 .
[38] DeLiang Wang,et al. An iterative model-based approach to cochannel speech separation , 2013, EURASIP J. Audio Speech Music. Process..
[39] Sam T. Roweis,et al. Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.
[40] Fabian J. Theis,et al. The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..
[41] Tuomas Virtanen,et al. Speech recognition using factorial hidden Markov models for separation in the feature space , 2006, INTERSPEECH.
[42] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Michael I. Jordan,et al. Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..
[44] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[45] Paris Smaragdis,et al. Convolutive Speech Bases and Their Application to Supervised Speech Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.