论文信息 - An SVM based classification approach to speech separation

An SVM based classification approach to speech separation

Monaural speech separation is a very challenging task. CASA-based systems utilize acoustic features to produce a time-frequency (T-F) mask. In this study, we propose a classification approach to monaural separation problem. Our feature set consists of pitch-based features and amplitude modulation spectrum features, which can discriminate both voiced and unvoiced speech from nonspeech interference. We employ support vector machines (SVMs) followed by a re-thresholding method to classify each T-F unit as either target-dominated or interference-dominated. An auditory segmentation stage is then utilized to improve SVM-generated results. Systematic evaluations show that our approach produces high quality binary masks and outperforms a previous system in terms of classification accuracy.

DeLiang Wang | Kun Han | Deliang Wang | Kun Han

[1] Yang Lu,et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[2] DeLiang Wang,et al. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[3] Lauren Calandruccio,et al. Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[4] D. Wang,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[5] P. Loizou,et al. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[6] DeLiang Wang,et al. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[7] DeLiang Wang,et al. Auditory Segmentation Based on Onset and Offset Analysis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8] DeLiang Wang,et al. A multipitch tracking algorithm for noisy and reverberant speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[10] IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[11] P. Boersma. Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[12] DeLiang Wang,et al. A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Ee-Peng Lim,et al. On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[14] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[15] DeLiang Wang,et al. Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).