Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System

Recently a great attention is paid to content-based multimedia retrieval that enables users to find and locate audio-visual materials according to the intrinsic characteristics of the target. Query-by-humming (QBH) is also an application that makes retrieval based on major characteristics of music, that is, "melody". There have been some researches on QBH system, most of which are to retrieve music from symbolic music data by humming query. However, when the usability of technology is taken into consideration, retrieval of music in the form of polyphonic raw audio would be more useful and needed in the applications such as internet music search or music juke box, where the music data is stored not in symbolic form but in raw digital audio signal because such music data is more natural format for consumption. Our focus is on the realization of query-by-humming technology for an easy-to-use application, which entails full automation of all the processes of the system, including melody information extraction from polyphonic raw audio. In our system, melody feature of music database and humming is not represented by distinct note information but by the probability of note occurrence. Similarity is then measured between the melody features of humming and music data using DP matching method. This paper presents developed algorithms and experimental results for key steps of QBH system including the melody feature extraction method from polyphonic audio and humming, their representation for matching, and matching method between represented melody information from polyphonic audio and humming.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  A Lewis,et al.  THE SCIENCE OF SOUND , 1997 .

[3]  T. Saito,et al.  Musical pitch estimation and discrimination of musical instruments using comb filters for transcription , 1999, 42nd Midwest Symposium on Circuits and Systems (Cat. No.99CH36356).

[4]  Yuen-Hsien Tseng,et al.  Content-based retrieval for music collections , 1999, SIGIR '99.

[5]  Anssi Klapuri,et al.  Pitch estimation using multiple independent time-frequency windows , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[6]  David Laurenson,et al.  Estimating clean speech thresholds for perceptual based speech enhancement , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[7]  Arbee L. P. Chen,et al.  An approximate string matching algorithm for content-based music data retrieval , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[8]  Youngmoo E. Kim,et al.  Analysis of a Contour-based Representation for Melody , 2000, ISMIR.

[9]  Anssi Klapuri,et al.  Multipitch estimation and sound separation by the spectral smoothness principle , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Ian H. Witten,et al.  Tune Retrieval in the Multimedia Library , 2000, Multimedia Tools and Applications.