A Query-by-Singing System for Retrieving Karaoke Music

This paper investigates the problem of retrieving karaoke music using query-by-singing techniques. Unlike regular CD music, where the stereo sound involves two audio channels that usually sound the same, karaoke music encompasses two distinct channels in each track: one is a mixture of the lead vocals and background accompaniment, and the other consists of accompaniment only. Although the two audio channels are distinct, the accompaniments in the two channels often resemble each other. We exploit this characteristic to: i) infer the background accompaniment for the lead vocals from the accompaniment-only channel, so that the main melody underlying the lead vocals can be extracted more effectively, and ii) detect phrase onsets based on the Bayesian information criterion (BIC) to predict the onset points of a song where a user's sung query may begin, so that the similarity between the melodies of the query and the song can be examined more efficiently. To further refine extraction of the main melody, we propose correcting potential errors in the estimated sung notes by exploiting a composition characteristic of popular songs whereby the sung notes within a verse or chorus section usually vary no more than two octaves. In addition, to facilitate an efficient and accurate search of a large music database, we employ multiple-pass dynamic time warping (DTW) combined with multiple-level data abstraction (MLDA) to compare the similarities of melodies. The results of experiments conducted on a karaoke database comprised of 1071 popular songs demonstrate the feasibility of query-by-singing retrieval for karaoke music.

[1]  Marc Leman,et al.  Factors affecting music retrieval in query-by-melody , 2006, IEEE Transactions on Multimedia.

[2]  Masashi Yamamuro,et al.  A practical query-by-humming system for a large music database , 2000, ACM Multimedia.

[3]  Simon Haykin,et al.  Adaptive Filter Theory 4th Edition , 2002 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Steffen Pauws Effects of song familiarity, singing training and recent song exposure on the singing of melodies , 2003, ISMIR.

[6]  Ning Hu,et al.  The MUSART Testbed for Query-by-Humming Evaluation , 2004, Computer Music Journal.

[7]  Mark B. Sandler,et al.  Polyphonic Score Retrieval Using Polyphonic Audio Queries: A Harmonic Modeling Approach , 2003, ISMIR.

[8]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[9]  Andreas Nürnberger,et al.  Towards Query by Singing/Humming on Audio Databases , 2007, ISMIR.

[10]  William P. Birmingham,et al.  Effectiveness of HMM-based retrieval on large databases , 2003, ISMIR.

[11]  Kyoungro Yoon,et al.  Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System , 2002, ISMIR.

[12]  Jyh-Shing Roger Jang,et al.  Hierarchical filtering method for content-based music retrieval via acoustic input , 2001, MULTIMEDIA '01.

[13]  William P. Birmingham,et al.  Automatic Thematic Extractor , 2003, Journal of Intelligent Information Systems.

[14]  Hsin-Min Wang,et al.  A Query-by-Singing Technique for Retrieving Polyphonic Objects of Popular Music , 2005, AIRS.

[15]  B. Galler,et al.  Predicting musical pitch from component frequency ratios , 1979 .

[16]  Sheng Gao,et al.  Extracting vocal melody from karaoke music audio , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[17]  Yueting Zhuang,et al.  Popular music retrieval by detecting mood , 2003, SIGIR.

[18]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[19]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[20]  Stefan M. Rüger,et al.  Robust Polyphonic Music Retrieval with N-grams , 2003, Journal of Intelligent Information Systems.

[21]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[22]  Remco C. Veltkamp,et al.  Searching notated polyphonic music using transportation distances , 2004, MULTIMEDIA '04.

[23]  Steffen Pauws,et al.  CubyHum: a fully operational "query by humming" system , 2002, ISMIR.

[24]  Bryan Pardo Finding structure in audio for music information retrieval , 2006 .

[25]  George Tzanetakis,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007 .

[26]  Takuichi Nishimura Music Signal Spotting Retrieval by a Humming Query Using Start Frame Feature Dependent Continuous Dynamic Programming , 2001, ISMIR.

[27]  William P. Birmingham,et al.  Name that tune: A pilot study in finding a melody from a sung query , 2004, J. Assoc. Inf. Sci. Technol..

[28]  Brian Christopher Smith,et al.  Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.

[29]  Shyamala C. Doraisamy,et al.  An Approach Towards A Polyphonic Music Retrieval System , 2001, ISMIR.

[30]  B. Widrow,et al.  Adaptive noise cancelling: Principles and applications , 1975 .

[31]  Jonathan Foote,et al.  ARTHUR: Retrieving Orchestral Music by Long-Term Structure , 2000, ISMIR.

[32]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[33]  Hsin-Min Wang,et al.  A query-by-example framework to retrieve music documents by singer , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[34]  Arbee L. P. Chen,et al.  An approximate string matching algorithm for content-based music data retrieval , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[35]  Ning Hu,et al.  A comparison of melodic database retrieval techniques using sung queries , 2002, JCDL '02.

[36]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[37]  Yoo-Sung Kim,et al.  A melody-based similarity computation algorithm for musical information , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[38]  Guy J. Brown,et al.  Extracting Melody Lines From Complex Audio , 2004, ISMIR.