Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach

Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a side-information may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art.

[1]  Jürgen Herre,et al.  MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes , 2010 .

[2]  Antoine Liutkus,et al.  Informed Source Separation Using Latent Components , 2010, LVA/ICA.

[3]  Oliver Hellmuth,et al.  Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding , 2008 .

[4]  Antoine Liutkus,et al.  Spatial coding-based Informed Source Separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Jürgen Herre,et al.  SPATIAL AUDIO OBJECT CODING WITH ENHANCED AUDIO OBJECT SEPARATION , 2010 .

[6]  Mark D. Plumbley,et al.  Probabilistic Modeling Paradigms for Audio Source Separation , 2010 .

[7]  Serhat Selcuk Bucak,et al.  Incremental subspace learning via non-negative matrix factorization , 2009, Pattern Recognit..

[8]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[9]  Christof Faller Parametric multichannel audio coding: synthesis of coherence cues , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Emmanuel Vincent,et al.  A General Framework for Online Audio Source Separation , 2012, LVA/ICA.

[11]  R. Lefebvre,et al.  Context-Based Adaptive Arithmetic Encoding of EAVQ Indices , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Tuomas Virtanen,et al.  Multichannel audio upmixing based on non-negative tensor factorization representation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[13]  Laurent Girin,et al.  Hybrid coding/indexing strategy for informed source separation of linear instantaneous under-determined audio mixtures , 2010 .

[14]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  W. Bastiaan Kleijn,et al.  Distribution Preserving Quantization With Dithering and Transformation , 2010, IEEE Signal Processing Letters.

[16]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[17]  W. Bastiaan Kleijn,et al.  The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Fabian J. Theis,et al.  The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges , 2012, Signal Process..

[21]  W. Bastiaan Kleijn,et al.  Rate Distribution Between Model and Signal , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[22]  Nicolas Sturmel,et al.  Linear Mixing Models for Active Listening of Music Productions in Realistic Studio Conditions , 2012 .

[23]  Bin Cheng,et al.  Encoding Independent Sources in Spatially Squeezed Surround Audio Coding , 2007, PCM.

[24]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[25]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[26]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[27]  Robert M. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[28]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[29]  G. Longo Source Coding Theory , 1970 .

[30]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..

[32]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Philippe Gournay,et al.  Unified speech and audio coding scheme for high quality at low bitrates , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Tuomas Virtanen,et al.  Object-Based Audio Coding Using Non-Negative Matrix Factorization for the Spectrogram Representation , 2010 .

[36]  W. Bastiaan Kleijn,et al.  Asymptotically Optimal Model Estimation for Quantization , 2011, IEEE Transactions on Communications.

[37]  C.-C. Jay Kuo,et al.  High-fidelity multichannel audio coding with Karhunen-Loeve transform , 2003, IEEE Trans. Speech Audio Process..

[38]  Laurent Girin,et al.  A Watermarking-Based Method for Informed Source Separation of Audio Signals With a Single Sensor , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Mattias Nilsson,et al.  On entropy-constrained vector quantization using gaussian mixture models , 2008, IEEE Transactions on Communications.

[40]  Laurent Girin,et al.  Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Sascha Disch,et al.  New Concepts in Parametric Coding of Spatial Audio: From SAC to SAOC , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[42]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Alexey Ozerov,et al.  Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Jesper Jensen,et al.  A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration , 2005, EURASIP J. Adv. Signal Process..

[45]  Athanasios Mouchtaris,et al.  A Multichannel Sinusoidal Model Applied to Spot Microphone Signals for Immersive Audio , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..

[47]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[48]  Nicolas Sturmel,et al.  Informed Source Separation Using Iterative Reconstruction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.