A Probabilistic Subspace Model for Multi-instrument Polyphonic Transcription

In this paper we present a general probabilistic model suitable for transcribing single-channel audio recordings containing multiple polyphonic sources. Our system requires no prior knowledge of the instruments in the mixture, although it can benefit from this information if available. In contrast to many existing polyphonic transcription systems, our approach explicitly models the individual instruments and is thereby able to assign detected notes to their respective sources. We use a set of training instruments to learn a model space which is then used during transcription to constrain the properties of models fit to the target mixture. In addition, we encourage model sparsity using a simple approach related to tempering. We evaluate our method on both recorded and synthesized two-instrument mixtures, obtaining average framelevel F-measures of up to 0.60 for synthesized audio and 0.53 for recorded audio. If knowledge of the instrument types in the mixture is available, we can increase these measures to 0.68 and 0.58, respectively, by initializing the model with parameters from similar instruments.

[1]  A. Klapuri,et al.  Analysis of polyphonic audio using source-filter model and non-negative matrix factorization , 2006 .

[2]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[3]  Daniel P. W. Ellis,et al.  Multi-voice polyphonic music transcription using eigeninstruments , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Bhiksha Raj,et al.  A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds , 2009, NIPS.

[5]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[6]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[7]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8]  Bhiksha Raj,et al.  Probabilistic Latent Variable Models as Nonnegative Factorizations , 2008, Comput. Intell. Neurosci..

[9]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Éric Gaussier,et al.  Relation between PLSA and NMF and implications , 2005, SIGIR '05.

[11]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[12]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[13]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .