论文信息 - Melody Transcription From Music Audio: Approaches and Evaluation - 字舞流文

Melody Transcription From Music Audio: Approaches and Evaluation

Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody-roughly, the part a listener might whistle or hum-as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70% correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications

Graham E. Poliner | Andreas F. Ehmann | D. Ellis | E. Gómez | A. Ehmann | Sebastian Streich | B. Ong | S. Streich

[1] David G. Stork,et al. Pattern Classification , 1973 .

[2] J. L. Goldstein. An optimum processor theory for the central formation of the pitch of complex tones. , 1973, The Journal of the Acoustical Society of America.

[3] John Strawn,et al. Lexicon of Analyzed Tones. Part 2: Clarinet and Oboe Tones , 1977 .

[4] L. F. Willems,et al. Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[5] Wolfgang Hess,et al. Pitch Determination of Speech Signals , 1983 .

[6] Robert C. Maher,et al. Evaluation of a method for separating digitized duet signals , 1990 .

[7] Dik J. Hermes,et al. Pitch analysis , 1993 .

[8] Richard F. Lyon,et al. On the importance of time—a temporal representation of sound , 1993 .

[9] Brian Christopher Smith,et al. Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[10] Pedro Cano,et al. Fundamental Frequency Estimation in the SMS analysis , 1998 .

[11] Kunio Kashino,et al. Application of the Bayesian probability network to music scene analysis , 1998 .

[12] Masataka Goto,et al. A Real-time Music Scene Description System: Detecting Melody and Bass Lines in Audio Signals , 1999 .

[13] Jonas Beskow,et al. Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[14] Shrikanth Narayanan,et al. Automatic main melody extraction from midi files with a modified Lempel-Ziv algorithm , 2001, Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No.01EX489).

[15] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[16] Brian C. Smith,et al. Query By Humming , 2001 .

[17] Christopher Raphael,et al. Automatic Transcription of Piano Music , 2002, ISMIR.

[18] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[19] Masataka Goto,et al. RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.

[20] Timo Viitaniemi,et al. Probabilistic models for the transcription of single-voice melodies , 2003 .

[21] Anssi Klapuri,et al. Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[22] M. Marolt. ON FINDING MELODIC LINES IN AUDIO RECORDINGS , 2004 .

[23] Amílcar Cardoso,et al. A methodology for detection of melody in polyphonic music signals , 2004 .

[24] Masataka Goto,et al. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[25] Matija Marolt,et al. A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[26] Ning Hu,et al. The MUSART Testbed for Query-by-Humming Evaluation , 2004, Computer Music Journal.

[27] Daniel P. W. Ellis,et al. A Classification Approach to Melody Transcription , 2005, ISMIR.

[28] Karin Dressler. EXTRACTION OF THE MELODY PITCH CONTOUR FROM POLYPHONIC AUDIO , 2005 .

[29] Alain de Cheveigné,et al. Pitch perception models , 2005 .

[30] Emmanuel Vincent,et al. The 2005 Music Information retrieval Evaluation Exchange (MIREX 2005): Preliminary Overview , 2005, ISMIR.

[31] Adil Alpkocak,et al. Melody extraction on MIDI music files , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[32] M.P. Ryynanen,et al. Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[33] Emmanuel Vincent,et al. Predominant-F0 estimation using Bayesian harmonic waveform models , 2005 .

[34] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[35] Daniel P. W. Ellis,et al. A Quantitative Comparison of Different Approaches for Melody Extraction from Polyphonic Audio Recordings , 2006 .

[36] Logan Volkers,et al. PHASE VOCODER , 2008 .