Polyphonic music transcription using note onset and offset detection

In this paper, an approach for polyphonic music transcription based on joint multiple-F0 estimation and note onset/offset detection is proposed. For preprocessing, the resonator time-frequency image of the input music signal is extracted and noise suppression is performed. A pitch salience function is extracted for each frame along with tuning and inharmonicity parameters. For onset detection, late fusion is employed by combining a novel spectral flux-based feature which incorporates pitch tuning information and a novel salience function-based descriptor. For each segment defined by two onsets, an overlapping partial treatment procedure is used and a pitch set score function is proposed. A note offset detection procedure is also proposed using HMMs trained on MIDI data. The system was trained on piano chords and tested on classic and jazz recordings from the RWC database. Improved transcription results are reported compared to state-of-the-art approaches.

[1]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[2]  Yannis Stylianou,et al.  Three Dimensions of Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[4]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Axel Röbel,et al.  Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jean-Pierre Martens,et al.  Assessment of State-of-the-Art Meter Analysis Systems with an Extended Meter Description Model , 2007, ISMIR.

[7]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[8]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  F. J. Cañadas Quesada,et al.  A Multiple-F0 Estimation Approach Based on Gaussian Spectral Modelling for Polyphonic Music Transcription , 2010 .

[11]  José Manuel Iñesta Quereda,et al.  Multiple fundamental frequency estimation using Gaussian smoothness , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Xavier Rodet,et al.  Spectral Envelope Estimation and Representation for Sound Analysis-Synthesis , 1999, ICMC.

[13]  Ruohua Zhou,et al.  Feature extraction of musical content for automatic music transcription , 2006 .

[14]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Simon Dixon,et al.  Multiple-F0 estimation of piano sounds exploiting spectral structure and temporal evolution , 2010, SAPA@INTERSPEECH.