论文信息 - A Combined Statistical and Machine Learning Approach For Single Channel Speech Enhancement

A Combined Statistical and Machine Learning Approach For Single Channel Speech Enhancement

University of Minnesota Ph.D. dissertation. May 2015. Major: Electrical Engineering. Advisor: Zhi-Quan Luo. 1 computer file (PDF); ix, 116 pages.

Hung-Wei Tseng

[1] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[2] Rajat Raina,et al. Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[3] Stephen D. Voran,et al. Objective estimation of perceived speech quality. I. Development of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[4] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[6] P. Loizou,et al. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[7] Mikkel N. Schmidt,et al. Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[8] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9] J. Larsen,et al. Reduction of non-stationary noise using a non-negative latent variable decomposition , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[10] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[11] Dennis H. Klatt,et al. Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[12] Gautham J. Mysore,et al. Universal speech models for speaker independent single channel source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13] Paris Smaragdis,et al. Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments , 2012, INTERSPEECH.

[14] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[15] Xiangfeng Wang,et al. The Linearized Alternating Direction Method of Multipliers for Dantzig Selector , 2012, SIAM J. Sci. Comput..

[16] Mingyi Hong,et al. Combining sparse NMF with deep neural network: A new classification-based approach for speech enhancement , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Richard C. Hendriks,et al. Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Guillermo Sapiro,et al. Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[19] Paris Smaragdis,et al. A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21] Bhiksha Raj,et al. A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[22] Yang Lu,et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[23] DeLiang Wang,et al. An SVM based classification approach to speech separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[25] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[26] Marc Teboulle,et al. A proximal-based decomposition method for convex minimization problems , 1994, Math. Program..

[27] Yi Hu,et al. Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[28] M. K. Hasan,et al. A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[29] S. Godsill,et al. Simple alternatives to the Ephraim and Malah suppression rule for speech enhancement , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[30] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[31] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32] Simon J. Godsill,et al. Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[33] Bin Chen,et al. A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[34] Stanley Osher,et al. A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[35] Schuyler Quackenbush,et al. Objective measures of speech quality , 1995 .

[36] Zhi-Quan Luo,et al. A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[37] James M Kates,et al. Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[38] Tara N. Sainath,et al. Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40] Bhiksha Raj,et al. Regularized non-negative matrix factorization with temporal dependencies for speech denoising , 2008, INTERSPEECH.

[41] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .

[42] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[43] Dimitri P. Bertsekas,et al. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[44] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[45] E.J. Candes,et al. An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[46] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[47] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[48] John G. Beerends,et al. A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[49] C. Févotte,et al. SINGLE SENSOR SOURCE SEPARATION USING MULTIPLE-WINDOW STFT REPRESENTATION 1 , 2006 .

[50] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[51] Paul R. White,et al. Mmse Speech Spectral Amplitude Estimators With Chi and Gamma Speech Priors , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[52] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[53] Yi Hu,et al. A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[54] Yi Hu,et al. Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[55] Arne Leijon,et al. A new linear MMSE filter for single channel speech enhancement based on Nonnegative Matrix Factorization , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[56] Francis Bach,et al. Itakura-Saito nonnegative matrix factorization with group sparsity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Herman J. M. Steeneken,et al. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[58] Paris Smaragdis,et al. Prediction based filtering and smoothing to exploit temporal dependencies in NMF , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59] Tao Zhang,et al. A novel single channel speech enhancement approach by combining Wiener filter and dictionary learning , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[60] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[61] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[62] Tuomas Virtanen,et al. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[63] Stephen P. Boyd,et al. Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[64] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[65] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[66] Jérôme Idier,et al. Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[67] Peter Vary,et al. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[68] Jesper Jensen,et al. Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[69] Tao Zhang,et al. A single channel speech enhancement approach by combining statistical criterion and multi-frame sparse dictionary learning , 2013, INTERSPEECH.

[70] Paris Smaragdis,et al. Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[71] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72] J. Larsen,et al. Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.