Monaural speech/music source separation using discrete energy separation algorithm

In this paper, we address the problem of monaural source separation of a mixed signal containing speech and music components. We use Discrete Energy Separation Algorithm (DESA) to estimate frequency-modulating (FM) signal energy. The FM signal energy is used to design a time-varying filter in the time-frequency domain for rejecting the interfering signal. The FM signal energy was chosen due to its good ability to differentiate between speech and music signals using localized information both in time and frequency. We present experimental results which demonstrate the advantages and limitations of the proposed method using synthetic data and real audio signals.

[1]  Petros Maragos,et al.  Speech analysis and synthesis using an AM-FM modulation model , 1999, Speech Commun..

[2]  Tuomas Virtanen,et al.  Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine , 2005, 2005 13th European Signal Processing Conference.

[3]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[4]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Petros Maragos,et al.  Time-frequency distributions for automatic speech recognition , 2001, IEEE Trans. Speech Audio Process..

[6]  Petros Maragos,et al.  Speech nonlinearities, modulations, and energy operators , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Laurent Benaroya,et al.  WIENER BASED SOURCE SEPARATION WITH HMM/GMM USING A SINGLE SENSOR , 2003 .

[8]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[9]  Sascha Disch,et al.  Multiband perceptual modulation analysis, processing and synthesis of audio signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Israel Cohen,et al.  Separation of speech and music sources from a single-channel mixture using discrete energy separation algorithm , 2010 .

[11]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Mark Kahrs,et al.  Analysis and resynthesis of musical instrument sounds using energy separation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Michael I. Jordan,et al.  Blind One-microphone Speech Separation: A Spectral Learning Approach , 2004, NIPS.

[17]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[18]  Les E. Atlas,et al.  Coherent modulation spectral filtering for single-channel music source separation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[20]  R. Gribonval,et al.  Proposals for Performance Measurement in Source Separation , 2003 .

[21]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[22]  Petros Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[24]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[25]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[26]  H. M. Teager,et al.  Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract , 1990 .

[27]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[28]  Eliathamby Ambikairajah,et al.  Speaker Identification using FM Features , 2006 .

[29]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[30]  Raymond Daniloff Speech science : recent advances , 1985 .

[31]  Douglas A. Reynolds,et al.  Measuring fine structure in speech: application to speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[32]  Anssi Klapuri,et al.  Separation of harmonic sound sources using sinusoidal modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).