Informed source separation through spectrogram coding and data embedding

We address the issue of underdetermined source separation in a particular informed configuration where both the sources and the mixtures are known during a so-called encoding stage. This knowledge enables the computation of a side-information which is small enough to be inaudibly embedded into the mixtures. At the decoding stage, the sources are no longer assumed to be known, only the mixtures and the extracted side-information are processed for source separation. The proposed system models the sources as independent and locally stationary Gaussian processes (GP) and the mixing process as a linear filtering. This model allows reliable estimation of the sources through generalized Wiener filtering, provided their spectrograms are known. As these spectrograms are too large to be embedded in the mixtures, we show how they can be efficiently approximated using either Nonnegative Tensor Factorization (NTF) or image compression. A high-capacity embedding method is used by the system to inaudibly embed the separation side-information into the mixtures. This method is an application of the Quantization Index Modulation technique applied to the time-frequency coefficients of the mixtures and permits to reach embedding rates of about 250kbps. Finally, a study of the performance of the full system is presented.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Gregory W. Wornell,et al.  Quantization index modulation: A class of provably good methods for digital watermarking and information embedding , 2001, IEEE Trans. Inf. Theory.

[3]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[4]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[5]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Laurent Girin,et al.  Informed source separation of underdetermined instantaneous stereo mixtures using source index embedding , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Emmanuel Vincent,et al.  The 2008 Signal Separation Evaluation Campaign: A Community-Based Approach to Large-Scale Evaluation , 2009, ICA.

[8]  Simon J. Godsill,et al.  Bayesian harmonic models for musical signal analysis , 2003 .

[9]  Ali Taylan Cemgil,et al.  Unsupervised single-channel source separation using bayesian NMF , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[10]  Jürgen Herre,et al.  IntMDCT - A link between perceptual and lossless audio coding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Rémi Gribonval,et al.  Under-Determined Reverberant Audio Source Separation Using Local Observed Covariance and Auditory-Motivated Time-Frequency Representation , 2010, LVA/ICA.

[12]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[13]  John W. Woods,et al.  Multidimensional Signal, Image and Video Processing and Coding [Book Reviews] , 2007, IEEE Signal Processing Magazine.

[14]  Jürgen Herre,et al.  MPEG Surround – the ISO/MPEG Standard for Efficient and Compatible Multi-Channel Audio Coding , 2007 .

[15]  E. Oja,et al.  Independent Component Analysis , 2013 .

[16]  Emmanuel Vincent,et al.  A General Modular Framework for Audio Source Separation , 2010, LVA/ICA.

[17]  Emmanuel Vincent,et al.  Stability Analysis of Multiplicative Update Algorithms and Application to Nonnegative Matrix Factorization , 2010, IEEE Transactions on Neural Networks.

[18]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[19]  S. Godsill,et al.  Prior Structures for Time-Frequency Energy Distributions , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[20]  G. Kramer Auditory Scene Analysis: The Perceptual Organization of Sound by Albert Bregman (review) , 2016 .

[21]  BertinNancy,et al.  Nonnegative matrix factorization with the itakura-saito divergence , 2009 .

[22]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[23]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[24]  T. Ens,et al.  Blind signal separation : statistical principles , 1998 .

[25]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[26]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[28]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[30]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[31]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[34]  Information technology — Generic coding of moving pictures and associated audio information — Part 2 : Video Technologies , 2022 .

[35]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[36]  A. Taylan Cemgil,et al.  Gamma Markov Random fields for audio source modelling , 2009, SIU 2009.

[37]  Antoine Liutkus,et al.  Informed Source Separation Using Latent Components , 2010, LVA/ICA.

[38]  H. Kameoka,et al.  Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[39]  Laurent Girin,et al.  A high-capacity watermarking technique for audio signals based on MDCT-domain quantization , 2010 .

[40]  Tuomas Virtanen,et al.  ALGORITHM FOR THE SEPARATION OF HARMONIC SOUNDS WITH TIME- FREQUENCY SMOOTHNESS CONSTRAINT , 2003 .

[41]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[42]  Marina Bosi,et al.  Overview of MPEG audio : Current and future standards for low-bit-rate audio coding , 1997 .

[43]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[44]  Tuomas Virtanen,et al.  Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music , 2008, SAPA@INTERSPEECH.

[45]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[46]  Rémi Gribonval,et al.  Non negative sparse representation for Wiener based source separation with a single sensor , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[47]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[48]  Daniele Barchiesi,et al.  AUTOMATIC TARGET MIXING USING LEAST-SQUARES OPTIMIZATION OF GAINS AND EQUALIZATION SETTINGS , 2009 .

[49]  Laurent Girin,et al.  A watermarking-based method for single-channel audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  俊一 甘利,et al.  A. Hyvärinen, J. Karhunen and E. Oja, Independent Component Analysis, Jhon Wiley & Sons, 2001年,504ページ. (根本幾・川勝真喜訳:独立成分分析——信号解析の新しい世界,東京電機大学出版局,2005年,532ページ.) , 2010 .

[51]  Laurent Girin,et al.  A Watermarking-Based Method for Informed Source Separation of Audio Signals With a Single Sensor , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  James L. Massey,et al.  Proper complex random processes with applications to information theory , 1993, IEEE Trans. Inf. Theory.

[53]  Hirokazu Kameoka,et al.  Consistent Wiener Filtering: Generalized Time-Frequency Masking Respecting Spectrogram Consistency , 2010, LVA/ICA.

[54]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[55]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[56]  A. Kondoz,et al.  Comparison of subjective and objective evaluation methods for audio source separation , 2008 .

[57]  Jürgen Herre,et al.  MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes , 2010 .