Perceptually controlled doping for audio source separation

The separation of an underdetermined audio mixture can be performed through sparse component analysis (SCA) that relies however on the strong hypothesis that source signals are sparse in some domain. To overcome this difficulty in the case where the original sources are available before the mixing process, the informed source separation (ISS) embeds in the mixture a watermark, which information can help a further separation. Though powerful, this technique is generally specific to a particular mixing setup and may be compromised by an additional bitrate compression stage. Thus, instead of watermarking, we propose a ‘doping’ method that makes the time-frequency representation of each source more sparse, while preserving its audio quality. This method is based on an iterative decrease of the distance between the distribution of the signal and a target sparse distribution, under a perceptual constraint. We aim to show that the proposed approach is robust to audio coding and that the use of the sparsified signals improves the source separation, in comparison with the original sources. In this work, the analysis is made only in instantaneous mixtures and focused on voice sources.

[1]  Christian Jutten,et al.  Estimating the mixing matrix in Sparse Component Analysis (SCA) based on partial k-dimensional subspace clustering , 2008, Neurocomputing.

[2]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Karthikeyan Natesan Ramamurthy,et al.  Mixing matrix estimation using discriminative clustering for blind source separation , 2013, Digit. Signal Process..

[4]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Shengli Xie,et al.  Mixing Matrix Estimation From Sparse Mixtures With Unknown Number of Sources , 2011, IEEE Transactions on Neural Networks.

[6]  Laurent Girin,et al.  A Watermarking-Based Method for Informed Source Separation of Audio Signals With a Single Sensor , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Panos M. Pardalos,et al.  A hierarchical approach for sparse source Blind Signal Separation problem , 2014, Comput. Oper. Res..

[8]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[9]  M. Varanasi,et al.  Parametric generalized Gaussian density estimation , 1989 .

[10]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[12]  Francesco Nesta,et al.  Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Nicolas Sturmel,et al.  Informed Source Separation Using Iterative Reconstruction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Multichannel Underdetermined Mixture , 2010, IEEE Transactions on Signal Processing.

[15]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[16]  Hiroshi Sawada,et al.  Simultaneous clustering of mixing and spectral model parameters for blind sparse source separation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Laurent Girin,et al.  A HIGH-RATE DATA HIDING TECHNIQUE FOR AUDIO SIGNALS BASED ON INTMDCT QUANTIZATION , 2011 .

[19]  Athanasios Mouchtaris,et al.  Real-Time Multiple Sound Source Localization and Counting Using a Circular Microphone Array , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Carlo H. Séquin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1995, IEEE Trans. Neural Networks.

[21]  Doping audio signals for source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[22]  Scott T. Rickard,et al.  Sparse sources are separated sources , 2006, 2006 14th European Signal Processing Conference.

[23]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[24]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Gaël Mahé,et al.  Nonlinear Audio Systems Identification Through Audio Input Gaussianization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Laurent Girin,et al.  "Sparsification" of Audio Signals Using the MDCT/IntMDCT and a Psychoacoustic Model - Application to Informed Audio Source Separation , 2011, Semantic Audio.

[27]  Meriem Jaïdane,et al.  Audio watermarking: a way to stationnarize audio signals , 2005, IEEE Transactions on Signal Processing.

[28]  Dan Hu,et al.  A fast mixing matrix estimation method in the wavelet domain , 2014, Signal Process..

[29]  Everton Z. Nadalin,et al.  An ICA-Based Method for Blind Source Separation in Sparse Domains , 2009, ICA.

[30]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  M. Jaidane-Saidane,et al.  Audio watermarking: a way to stationnarize audio signals , 2005 .

[32]  Nicolas Sturmel,et al.  Informed audio source separation: A comparative study , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[33]  Stanislaw Gorlow,et al.  Informed source separation: Underdetermined source signal recovery from an instantaneous stereo mixture , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[34]  Bernhard Laback,et al.  Time–Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Gaël Mahé,et al.  Revisiting quantization theorem through audiowatermarking , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[37]  Antoine Liutkus,et al.  Spatial coding-based Informed Source Separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[38]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[39]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..