Coupled tensor factorization models for polyphonic music transcription

Generalized Coupled Tensor Factorization (GCTF) is a recently proposed algorithmic framework for simultaneously estimating tensor factorization models where several tensors can share a set of latent factors. This paper presents two models in this framework for transcribing polyphonic piano pieces. The first model is based on Non-negative Matrix Factorization where the coupling provides the spectral information to the model. As an extension to the first model, the second model incorporates temporal and harmonic information by taking a rough, incomplete transciption of the piece as input. Incorporating harmonic knowledge improves the transcription quality as the the experimental results show that we get around 23 % F-measure improvement on real piano data.

[1]  Simon J. Godsill,et al.  Generative Spectrogram Factorization Models for Polyphonic Piano Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Ali Taylan Cemgil,et al.  Probabilistic Latent Tensor Factorization , 2010, LVA/ICA.

[3]  Ali Taylan Cemgil,et al.  Generalised Coupled Tensor Factorisation , 2011, NIPS.

[4]  Ali Taylan Cemgil,et al.  Score guided audio restoration via generalised coupled tensor factorisation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Ali Taylan Cemgil,et al.  Probabilistic latent tensor factorization framework for audio modeling , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[8]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[9]  Emmanuel Vincent,et al.  Harmonic and inharmonic Nonnegative Matrix Factorization for Polyphonic Pitch transcription , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.