Melody Transcription From Music Audio: Approaches and Evaluation

Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody-roughly, the part a listener might whistle or hum-as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70% correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  J. L. Goldstein An optimum processor theory for the central formation of the pitch of complex tones. , 1973, The Journal of the Acoustical Society of America.

[3]  John Strawn,et al.  Lexicon of Analyzed Tones. Part 2: Clarinet and Oboe Tones , 1977 .

[4]  L. F. Willems,et al.  Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[5]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[6]  Robert C. Maher,et al.  Evaluation of a method for separating digitized duet signals , 1990 .

[7]  Dik J. Hermes,et al.  Pitch analysis , 1993 .

[8]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[9]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[10]  Pedro Cano,et al.  Fundamental Frequency Estimation in the SMS analysis , 1998 .

[11]  Kunio Kashino,et al.  Application of the Bayesian probability network to music scene analysis , 1998 .

[12]  Masataka Goto,et al.  A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[13]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[14]  Shrikanth Narayanan,et al.  Automatic main melody extraction from midi files with a modified Lempel-Ziv algorithm , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[15]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[16]  Brian C. Smith,et al.  Query By Humming , 2001 .

[17]  Christopher Raphael,et al.  Automatic Transcription of Piano Music , 2002, ISMIR.

[18]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[19]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[20]  Timo Viitaniemi,et al.  Probabilistic models for the transcription of single-voice melodies , 2003 .

[21]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[22]  M. Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS , 2004 .

[23]  Amílcar Cardoso,et al.  A methodology for detection of melody in polyphonic music signals , 2004 .

[24]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[25]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[26]  Ning Hu,et al.  The MUSART Testbed for Query-by-Humming Evaluation , 2004, Computer Music Journal.

[27]  Daniel P. W. Ellis,et al.  A Classification Approach to Melody Transcription , 2005, ISMIR.

[28]  Karin Dressler EXTRACTION OF THE MELODY PITCH CONTOUR FROM POLYPHONIC AUDIO , 2005 .

[29]  Alain de Cheveigné,et al.  Pitch perception models , 2005 .

[30]  Emmanuel Vincent,et al.  The 2005 Music Information retrieval Evaluation Exchange (MIREX 2005): Preliminary Overview , 2005, ISMIR.

[31]  Adil Alpkocak,et al.  Melody extraction on MIDI music files , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[32]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[33]  Emmanuel Vincent,et al.  Predominant-F0 estimation using Bayesian harmonic waveform models , 2005 .

[34]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[35]  Daniel P. W. Ellis,et al.  A Quantitative Comparison of Different Approaches for Melody Extraction from Polyphonic Audio Recordings , 2006 .

[36]  Logan Volkers,et al.  PHASE VOCODER , 2008 .