Improving PLCA-based score-informed source separation with invertible Constant-Q Transforms

Probabilistic Latent Component Analysis is a widely adopted variant of Nonnegative Matrix Factorization for the purpose of single channel audio source separation. It has seen many extensions, including incorporation of prior information derived from music scores. Recent work on the invertibility of the Constant-Q Tranform make that a viable alternative to the Short-time Fourier Transform as underlying data representation. In this paper we assess several implementations for their usability in score-informed source separation. We show that results are comparable to, and in some cases better than, use of the STFT, and that exact transform invertibility is not a significant factor in this application.

[1]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[3]  Antoine Liutkus,et al.  Probabilistic model for main melody extraction using Constant-Q transform , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Simon Dixon,et al.  Multiple-instrument polyphonic music transcription using a convolutive probabilistic model , 2011 .

[5]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[7]  亀岡 弘和,et al.  Statistical approach to multipitch analysis , 2007 .

[8]  Simon J. Godsill,et al.  Bayesian extensions to non-negative matrix factorisation for audio signal modelling , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Christopher Raphael,et al.  Informed source separation of orchestra and soloist using masking and unmasking , 2010, SAPA@INTERSPEECH.

[10]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[11]  Bhiksha Raj,et al.  A Probabilistic Latent Variable Model for Acoustic Modeling , 2006 .

[12]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[13]  Thomas Grill,et al.  CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH NONSTATIONARY GABOR FRAMES , 2011 .

[14]  Roland Badeau,et al.  Scale-invariant probabilistic latent component analysis , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[15]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[16]  Mark D. Plumbley,et al.  A comparison of two different methods for score-informed source separation , 2012 .

[17]  Gautham J. Mysore,et al.  Evaluation of a Score-informed Source Separation System , 2010, ISMIR.

[18]  Paris Smaragdis,et al.  Separation by “humming”: User-guided sound extraction from monophonic mixtures , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[19]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.