Informed Source Separation Using Iterative Reconstruction

This paper presents a technique for Informed Source Separation (ISS) of a single channel mixture, based on the Multiple Input Spectrogram Inversion (MISI) phase estimation method. The reconstruction of the source signals is iterative, alternating between a time-frequency consistency enforcement and a re-mixing constraint. A dual resolution technique is also proposed, for sharper transients reconstruction. The two algorithms are compared to a state-of-the-art Wiener-based ISS technique, on a database of fourteen monophonic mixtures, with standard source separation objective measures. Experimental results show that the proposed algorithms outperform both this reference technique and the oracle Wiener filter by up to 3 dB in distortion, at the cost of a significantly heavier computation.

[1]  Hirokazu Kameoka,et al.  Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency , 2010, LVA/ICA.

[2]  Lonce L. Wyse,et al.  Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Nicolas Sturmel,et al.  Iterative phase reconstruction of wiener filtered signals , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[7]  Volker Gnann,et al.  Inversion of short-time fourier transform magnitude spectrograms with adaptive window lengths , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Laurent Girin,et al.  A high-capacity watermarking technique for audio signals based on MDCT-domain quantization , 2010 .

[9]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.

[10]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[12]  Jürgen Herre,et al.  MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes , 2010 .

[13]  Stanislaw Gorlow,et al.  Informed source separation: Underdetermined source signal recovery from an instantaneous stereo mixture , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  Volker Gnann,et al.  MULTIRESOLUTION STFT PHASE ESTIMATION WITH FRAME-WISE POSTERIOR WINDOW-LENGTH DECISION , 2011 .

[15]  Volker Gnann,et al.  IMPROVING RTISI PHASE ESTIMATION WITH ENERGY ORDER AND PHASE UNWRAPPING , 2010 .

[16]  Mark B. Sandler,et al.  On the use of phase and energy for musical onset detection in the complex domain , 2004, IEEE Signal Processing Letters.

[17]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[18]  Jonathan Le Roux,et al.  FAST SIGNAL RECONSTRUCTION FROM MAGNITUDE STFT SPECTROGRAM BASED ON SPECTROGRAM CONSISTENCY , 2010 .

[19]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..