Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation

A vector quantizer (VQ) trained on short-time frames of a particular source can form an accurate non-parametric model of that source. This principle has been used in several previous source separation and enhancement schemes as a basis for filtering the original mixture. In this paper, we propose the "projection" of a corrupted target signal onto the constrained space represented by the model as a viable model for source separation. We investigate some parameters of VQ encoding, including a more perceptually-motivated distance measure, and an encoding of phase derivatives that supports reconstruction directly from quantizer output alone. For the problem of separating speech from noise, we highlight some problems with this approach, including the need for sequential constraints (which we introduce with a simple hidden Markov model), and choices for choosing the best quantization for over-lapping sources

[1]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[2]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[3]  Richard F. Lyon,et al.  Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[5]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[6]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[7]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[8]  Volker Hohmann,et al.  Computational auditory scene analysis by using statistics of high-dimensional speech dynamics and sound source direction , 2003, INTERSPEECH.

[9]  Bhiksha Raj,et al.  Soft mask estimation for single channel speaker separation , 2004, SAPA@INTERSPEECH.

[10]  John R. Hershey,et al.  Single microphone source separation using high resolution signal reconstruction , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Daniel P. W. Ellis,et al.  Multiband audio modeling for single-channel acoustic source separation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Logan Volkers,et al.  PHASE VOCODER , 2008 .