Performance evaluation for transform domain model-based single-channel speech separation

It is already demonstrated that selected features have a much larger effect to the overall performance in speech applications accuracy than the selected generative models have. In this paper, we propose subband perceptually weighted transformation (SPWT) applied on magnitude spectrum to improve the performance of single-channel separation scenario (SCSS). In particular, we compare three feature types namely, log-spectrum, magnitude spectrum and the proposed SPWT. A comprehensive statistical analysis is performed to evaluate the performance of a VQ-based SCSS framework in terms of the lower error bound. At the core of this approach are two trained codebooks of the quantized feature vectors of speakers, whereby the main evaluation for separation is performed. The simulation results show that the proposed transformation offers an attractive candidate to improve the separation performance of model-based SCSS. It is also observed that the proposed feature can result in a lower-error bound in terms of the spectral distortion (SD) as well as higher SSNR in comparison with other features. single-channel speech separation

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Wai C. Chu Vector Quantization of Harmonic Magnitudes in Speech Coding Applications—A Survey and New Technique , 2004, EURASIP J. Adv. Signal Process..

[3]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[4]  K. Paliwal,et al.  Quantization of LPC Parameters , 2022 .

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  Kuldip K. Paliwal,et al.  A comparative study of LPC parameter representations and quantisation schemes for wideband speech coding , 2007, Digit. Signal Process..

[7]  Richard M. Dansereau,et al.  Speaker-independent model-based single channel speech separation , 2008, Neurocomputing.

[8]  DeLiang Wang,et al.  A model for multitalker speech perception. , 2008, The Journal of the Acoustical Society of America.

[9]  Richard M. Dansereau,et al.  A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation , 2006, EURASIP J. Audio Speech Music. Process..

[10]  Richard M. Dansereau,et al.  Performance evaluation of three features for model-based single channel speech separation problem , 2006, INTERSPEECH.

[11]  Daniel P. W. Ellis,et al.  Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  A. Sayadian,et al.  Model-based monaural sound separation by split-VQ of sinusoidal parameters , 2008, 2008 16th European Signal Processing Conference.

[13]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[14]  Bhiksha Raj,et al.  Soft Mask Methods for Single-Channel Speaker Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[16]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[17]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[18]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[19]  W. Bastiaan Kleijn,et al.  On frequency quantization in sinusoidal audio coding , 2005, IEEE Signal Processing Letters.