Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings

Close-microphone techniques are extensively employed in many live music recordings, allowing for interference rejection and reducing the amount of reverberation in the resulting instrument tracks. However, despite the use of directional microphones, the recorded tracks are not completely free from source interference, a problem which is commonly known as microphone leakage. While source separation methods are potentially a solution to this problem, few approaches take into account the huge amount of prior information available in this scenario. In fact, besides the special properties of close-microphone tracks, the knowledge on the number and type of instruments making up the mixture can also be successfully exploited for improved separation performance. In this paper, a nonnegative matrix factorization (NMF) method making use of all the above information is proposed. To this end, a set of instrument models are learnt from a training database and incorporated into a multichannel extension of the NMF algorithm. Several options to initialize the algorithm are suggested, exploring their performance in multiple music tracks and comparing the results to other state-of-the-art approaches.

[1]  Derry Fitzgerald,et al.  User assisted separation using tensor factorisations , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[2]  S. Godsill,et al.  Prior Structures for Time-Frequency Energy Distributions , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Joshua D. Reiss,et al.  Microphone interference reduction in live sound , 2011 .

[5]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[6]  James R. Hopgood,et al.  The effect of sensor placement in blind source separation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[7]  Antoine Liutkus,et al.  Informed source separation: Source coding meets source separation , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8]  Irfan A. Essa,et al.  Estimating the Spatial Position of Spectral Components in Audio , 2006, ICA.

[9]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Maximo Cobos,et al.  Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[12]  Jordi Janer,et al.  Score-informed and timbre independent lead instrument separation in real-world scenarios , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[13]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[15]  Masataka Goto,et al.  Instrument Equalizer for Query-by-Example Retrieval: Improving Sound Source Separation Based on Integrated Harmonic and Inharmonic Models , 2008, ISMIR.

[16]  Nicolás Ruiz-Reyes,et al.  Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures , 2013, Multimedia Tools and Applications.

[17]  Joshua D. Reiss,et al.  A Wiener Filter Approach to Microphone Leakage Reduction in Close-Microphone Applications , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Gaël Richard,et al.  A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  Tuomas Virtanen,et al.  Sound Source Separation in Monaural Music Signals , 2006 .

[20]  Alexey Ozerov,et al.  Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Ali Taylan Cemgil,et al.  Score guided musical source separation using Generalized Coupled Tensor Factorization , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[22]  BertinNancy,et al.  Nonnegative matrix factorization with the itakura-saito divergence , 2009 .

[23]  Gautham J. Mysore,et al.  Evaluation of a Score-informed Source Separation System , 2010, ISMIR.

[24]  Paris Smaragdis,et al.  Blind separation of convolved mixtures in the frequency domain , 1998, Neurocomputing.

[25]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[27]  Maximo Cobos,et al.  Blind Estimation of Reverberation Time from Monophonic Instrument Recordings Based on Non-Negative Matrix Factorization , 2011, Semantic Audio.

[28]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[29]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[30]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[32]  Lucas C. Parra,et al.  Convolutive Blind Source Separation Methods , 2008 .

[33]  John Mourjopoulos,et al.  Unmixing Acoustic Sources in Real Reverberant Environments for Close-Microphone Applications* , 2010 .

[34]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[35]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[36]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[37]  Antoine Liutkus,et al.  Gaussian Processes for Underdetermined Source Separation , 2011, IEEE Transactions on Signal Processing.

[38]  Derry Fitzgerald,et al.  Extended Nonnegative Tensor Factorisation Models for Musical Sound Source Separation , 2008, Comput. Intell. Neurosci..

[39]  David Laurenson,et al.  Estimating clean speech thresholds for perceptual based speech enhancement , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[40]  Roland Badeau,et al.  Blind Harmonic Adaptive Decomposition applied to supervised source separation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[41]  Emmanuel Vincent,et al.  A General Flexible Framework for the Handling of Prior Information in Audio Source Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  D. Fitzgerald,et al.  Non-negative Tensor Factorisation for Sound Source Separation , 2005 .

[43]  Roland Badeau,et al.  Time-dependent parametric and harmonic templates in non-negative matrix factorization , 2010 .

[44]  Tuomas Virtanen,et al.  Sound Source Separation Using Sparse Coding with Temporal Continuity Objective , 2003, ICMC.

[45]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[46]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[47]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[48]  Tuomas Virtanen,et al.  Detection, separation and recognition of speech from continuous signals using spectral factorisation , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[49]  Jun Wu,et al.  Multipitch estimation by joint modeling of harmonic and transient sounds , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[50]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[51]  Antoine Liutkus,et al.  Informed source separation through spectrogram coding and data embedding , 2012, Signal Process..

[52]  A. Klapuri,et al.  Analysis of polyphonic audio using source-filter model and non-negative matrix factorization , 2006 .

[53]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[54]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription , 2013, Eng. Appl. Artif. Intell..

[55]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[56]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[57]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[59]  David Miles Huber,et al.  Modern Recording Techniques , 1974 .