Novel Deep Architectures in Speech Processing
暂无分享,去创建一个
Jonathan Le Roux | Scott Wisdom | John R. Hershey | Shinji Watanabe | Zhuo Chen | Yusuf Isik | J. Hershey | Scott Wisdom | Shinji Watanabe | Zhuo Chen | Y. Isik
[1] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Jonathan Le Roux,et al. Deep NMF for speech separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Guillermo Sapiro,et al. Supervised non-euclidean sparse NMF via bilevel optimization with applications to speech enhancement , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).
[4] Anil K. Jain,et al. Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[5] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Martin J. Wainwright,et al. A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.
[7] Jonathan Le Roux,et al. Discriminative NMF and its application to single-channel source separation , 2014, INTERSPEECH.
[8] Yifan Gong,et al. An Overview of Noise-Robust Automatic Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Guillermo Sapiro,et al. Bilevel Sparse Models for Polyphonic Music Transcription , 2013, ISMIR.
[10] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.
[11] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .
[12] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[13] Justin Domke,et al. Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Yoshua Bengio,et al. Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.
[15] Emanuel A. P. Habets,et al. New Insights Into the MVDR Beamformer in Room Acoustics , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[16] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Tomohiro Nakatani,et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[18] Steve Renals,et al. WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[19] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.
[20] Philip H. S. Torr,et al. Recurrent Instance Segmentation , 2015, ECCV.
[21] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[22] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[23] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[24] Ken Kreutz-Delgado,et al. The Complex Gradient Operator and the CR-Calculus ECE275A - Lecture Supplement - Fall 2005 , 2009, 0906.4835.
[25] John R. Hershey,et al. Perceptual inference in generative models , 2005 .
[26] Richard M. Stern,et al. Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.
[30] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.
[31] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[32] Rémi Gribonval,et al. Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[33] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[34] Steve Renals,et al. Convolutional Neural Networks for Distant Speech Recognition , 2014, IEEE Signal Processing Letters.
[35] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[36] William T. Freeman,et al. Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.
[37] Patrice Marcotte,et al. An overview of bilevel optimization , 2007, Ann. Oper. Res..
[38] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[39] Justin Domke,et al. Parameter learning with truncated message-passing , 2011, CVPR 2011.
[40] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[41] J. Eggert,et al. Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).
[42] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[43] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.
[44] Paris Smaragdis,et al. Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] David Marr,et al. VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .
[46] Richard S. Zemel,et al. Mean-Field Networks , 2014, ArXiv.
[47] Guillermo Sapiro,et al. Supervised Sparse Analysis and Synthesis Operators , 2013, NIPS.
[48] Daniel P. W. Ellis,et al. Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[49] Yann LeCun,et al. Learning Fast Approximations of Sparse Coding , 2010, ICML.
[50] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[51] Hagai Attias,et al. New EM algorithms for source separation and deconvolution with a microphone array , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[52] Yongqiang Wang,et al. An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[53] Martial Hebert,et al. Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.
[54] Hiroshi Sawada,et al. A Multichannel MMSE-Based Framework for Speech Source Separation and Noise Reduction , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[55] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] M. Opper,et al. Comparing the Mean Field Method and Belief Propagation for Approximate Inference in MRFs , 2001 .
[57] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.
[58] Jonathan Le Roux,et al. Deep unfolding for multichannel source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[60] Albert S. Bregman,et al. The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .
[61] Jean Ponce,et al. Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[62] Jonathan Le Roux,et al. Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.
[63] Tomer Hertz,et al. Pairwise Clustering and Graphical Models , 2003, NIPS.