论文信息 - Query by humming: in action with its technology revealed

Query by humming: in action with its technology revealed

You have a tune lingering in your head for many days, but you don’t know where you heard this tune or which song it is from. This demo will show a Query by Humming system that will tell you the name of that song. Most of the research in pre-existing query by humming systems uses pitch contour to match similar melodies (for example [1]). The user’s humming is transcribed to a sequence of discrete notes and the contour information is extracted from the notes. This contour information is represented by a few letters. For example, (“U”, “D”, “S”) represents that a note is above, below or the same as the previous one. The tunes in the databases are also represented by contour information. The edit distance can be use to measure the similarity between two melodies. Unfortunately, it is very hard to segment a user’s humming into discrete notes. Some recent work proposes to match the query directly from audio based on dynamic time warping to match the hum-query with the melodies in the music databases. But this quality improvement comes at a price because a brute-force search using DTW is very slow. The database community has been researching problems in similarity query for time series databases for many years. The techniques developed in the area might shed light on the query by humming problem. In this demo, we treat both the melodies in the music databases and the user humming input as time series. Such an approach allows us to integrate many database indexing techniques into a query by humming system, improving the quality of such system over the traditional (contour) string databases approach. We design special searching techniques that are invariant to shifting, time scaling and local time warping. This makes the system robust and allows more flexible user humming input.

Dennis Shasha | Yunyue Zhu | Xiaojian Zhao

[1] Brian Christopher Smith,et al. Query by humming: musical information retrieval in an audio database , 1995, MULTIMEDIA '95.