A Multiple-F0 Estimation Approach Based on Gaussian Spectral Modelling for Polyphonic Music Transcription

Abstract This paper proposes a multiple-F0 estimation algorithm for automatic polyphonic music transcription. The proposed algorithm operates at frame level, searching for the set of fundamental frequencies that minimizes a spectral distance measure at each audio frame. The spectral distance measure is defined under the assumption that a polyphonic sound can be modelled by a weighted sum of Gaussian spectral models. Due to the fact that in polyphonic music signals the spectral content at the current audio frame depends to a large extent on the immediately previous ones, the defined spectral distance measure takes into account not only information from the current audio frame but also from some previous ones. An additional performance improvement is achieved by using a Hidden Markov Model (HMM) at the end of the algorithm. The proposed algorithm is tested using real-world polyphonic music recordings taken from the RWC music database. Accuracy rates are reported when our algorithm is performed under different conditions. Classification of the total error into the three categories of errors (substitutions, misses and false alarms) is also reported. Comparison with five recent state-of-the art transcription systems is finally shown.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[3]  Anssi Klapuri,et al.  Sound onset detection by applying psychoacoustic knowledge , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Mark B. Sandler,et al.  Techniques for Automatic Music Transcription , 2000, ISMIR.

[5]  Mark D. Plumbley,et al.  Automatic Music Transcription and Audio Source Separation , 2002, Cybern. Syst..

[6]  Mark Sandler,et al.  AUTOMATIC POLYPHONIC PIANO NOTE EXTRACTION USING FUZZY LOGIC IN A BLACKBOARD SYSTEM , 2002 .

[7]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[8]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[9]  Axel Röbel,et al.  Multiple fundamental frequency estimation of polyphonic music signals , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[11]  Miguel A. Alonso,et al.  Extracting note onsets from musical recordings , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[12]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[13]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mark R. Every,et al.  Separation of synchronous pitched notes by spectral filtering of harmonics , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[16]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  R. Badeau,et al.  Multipitch estimation of quasi-harmonic sounds in colored noise , 2007 .

[18]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[19]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Roland Badeau,et al.  Automatic transcription of piano music based on HMM tracking of jointly-estimated pitches , 2008, 2008 16th European Signal Processing Conference.

[21]  José Manuel Iñesta Quereda,et al.  Multiple fundamental frequency estimation using Gaussian smoothness , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  J. J. Carabias-Orti,et al.  Note-event Detection in Polyphonic Musical Signals based on Harmonic Matching Pursuit and Spectral Smoothness , 2008 .