Automatic Transcription of Music Audio Through Continuous Parameter Tracking

We present a method for transcribing arbitrary pitched music into a piano-roll-like representation that also tracks the amplitudes of the notes over time. We develop a probabilistic model that gives the likelihood of a frame of audio data given a vector of amplitudes for the possible notes. Using an approximation of the log likelihood function, we develop an objective function that is quadratic in the timevarying amplitude variables, while also depending on the discrete piano-roll variables. We optimize this function using a variant of dynamic programming, by repeatedly growing and pruning our histories. We present results on a variety of different examples using several measures of performance including an edit-distance measure as well as a frame-by-frame measure.

[1]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[2]  Avi Pfeffer,et al.  Signal-to-Score Music Transcription using Graphical Models , 2005, IJCAI.

[3]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[4]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[5]  David Barber,et al.  A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Mark D. Plumbley,et al.  Polyphonic music transcription by non-negative sparse coding of power spectra , 2004 .

[7]  Simon Godsill,et al.  Poisson point process modeling for polyphonic music transcription. , 2007, The Journal of the Acoustical Society of America.

[8]  Hirokazu Kameoka,et al.  Audio stream segregation of multi-pitch music signal based on time-space clustering using Gaussian kernel 2-dimensional model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Mark D. Plumbley,et al.  Polyphonic transcription by non-negative sparse coding of power spectra , 2004, ISMIR.

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Simon J. Godsill,et al.  Bayesian harmonic models for musical pitch estimation and analysis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.