Polyphonic music transcription using note event modeling

This paper proposes a method for the automatic transcription of real-world music signals, including a variety of musical genres. The method transcribes notes played with pitched musical instruments. Percussive sounds, such as drums, may be present but they are not transcribed. Musical notations (i.e., MIDI files) are produced from acoustic stereo input files using probabilistic note event modeling. Note events are described with a hidden Markov model (HMM). The model uses three acoustic features extracted with a multiple fundamental frequency (FO) estimator to calculate the likelihoods of different notes and performs temporal segmentation of notes. The transitions between notes are controlled with a musicological model involving musical key estimation and bigram models. The final transcription is obtained by searching for several paths through the note models. Evaluation was carried out with a realistic music database. Using strict evaluation criteria, 39% of all the notes were found (recall) and 41% of the transcribed notes were correct (precision). Taken the complexity of the considered transcription task, the results are encouraging.

[1]  Simon J. Godsill,et al.  Bayesian harmonic models for musical signal analysis , 2003 .

[2]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[3]  C. Krumhansl Cognitive Foundations of Musical Pitch , 1990 .

[4]  Timo Viitaniemi,et al.  Probabilistic models for the transcription of single-voice melodies , 2003 .

[5]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[6]  Anssi Klapuri,et al.  Modelling of note events for singing transcription , 2004, SAPA@INTERSPEECH.

[7]  Kunio Kashino,et al.  Organization of Hierarchical Perceptual Sounds: Music Scene Analysis with Autonomous Processing Modules and a Quantitative Information Integration Mechanism , 1995, IJCAI.

[8]  A.P. Klapuri,et al.  A perceptually motivated multiple-F0 estimation method , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[9]  Anssi Klapuri,et al.  Signal Processing Methods for the Automatic Transcription of Music , 2004 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Keith D. Martin,et al.  Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing , 1999 .

[12]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .