Degenerate Unmixing Estimation Technique using the Constant Q Transform

The Degenerate Unmixing Estimation Technique (DUET) is a Blind Source Separation (BSS) algorithm for stereo audio. DUET depends on an amplitude-phase 2d histogram built from the differences between the two channels, where peaks in the histogram indicate sources in the mixture. If peaks overlap, separation becomes unfeasible. This is often the case for music mixtures. We propose to improve peak separation by building histograms from time-frequency representations based on the Constant Q Transform (CQT) instead of the Fourier Transform (FT). The CQT has a logarithmic frequency resolution matching the geometrically spaced notes of the Western music scale. We also adaptively resize histogram bins and use Wiener filtering to improve peak resolving and source reconstruction. Results on mixtures of harmonic musical instruments show improvement in separation, especially at low frequencies and for closely spaced sources.

[1]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[2]  Özgür Yilmaz,et al.  Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[4]  Scott Rickard,et al.  The DUET Blind Source Separation Algorithm , 2007, Blind Speech Separation.

[5]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[6]  Barak A. Pearlmutter,et al.  Survey of sparse and non‐sparse methods in source separation , 2005, Int. J. Imaging Syst. Technol..

[7]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[8]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[9]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.