Tonal representations for music retrieval: from version identification to query-by-humming

In this study we compare the use of different music representations for retrieving alternative performances of the same musical piece, a task commonly referred to as version identification. Given the audio signal of a song, we compute descriptors representing its melody, bass line and harmonic progression using state-of-the-art algorithms. These descriptors are then employed to retrieve different versions of the same musical piece using a dynamic programming algorithm based on nonlinear time series analysis. First, we evaluate the accuracy obtained using individual descriptors, and then we examine whether performance can be improved by combining these music representations (i.e. descriptor fusion). Our results show that whilst harmony is the most reliable music representation for version identification, the melody and bass line representations also carry useful information for this task. Furthermore, we show that by combining these tonal representations we can increase version detection accuracy. Finally, we demonstrate how the proposed version identification method can be adapted for the task of query-by-humming. We propose a melody-based retrieval approach, and demonstrate how melody representations extracted from recordings of a cappella singing can be successfully used to retrieve the original song from a collection of polyphonic audio. The current limitations of the proposed approach are discussed in the context of version identification and query-by-humming, and possible solutions and future research directions are proposed.

[1]  Earl Vickers Automatic Long-term Loudness and Dynamics Matching , 2001 .

[2]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[3]  François Pachet,et al.  Knowledge Management and Musical Metadata , 2005 .

[4]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[5]  Joan Serrà,et al.  Identification of versions of the same musical composition by processing audio descriptions , 2011 .

[6]  Bryan Pardo,et al.  Speeding Melody Search With Vantage Point Trees , 2008, ISMIR.

[7]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Emilia Gómez,et al.  MELODY EXTRACTION FROM POLYPHONIC MUSIC: MIREX 2011 , 2011 .

[10]  Dimitrios Gunopulos,et al.  A survey of query-by-humming similarity methods , 2012, PETRA '12.

[11]  Emilia Gómez,et al.  Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond , 2010, Advances in Music Information Retrieval.

[12]  Kien A. Hua,et al.  Transfer non-metric measures into metric for similarity search , 2009, MM '09.

[13]  Matija Marolt,et al.  A Mid-Level Representation for Melody-Based Retrieval in Audio Collections , 2008, IEEE Transactions on Multimedia.

[14]  Xavier Serra,et al.  Statistical Analysis of Chroma Features in Western Music Predicts Human Judgments of Tonality , 2008 .

[15]  Hsin-Min Wang,et al.  Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval , 2008, J. Inf. Sci. Eng..

[16]  Marianne Afifi,et al.  Joint Conference on Digital Libraries (JCDL) , 2003 .

[17]  James Kalbach Understanding information systems: What they do and why we need them , 2005, J. Assoc. Inf. Sci. Technol..

[18]  Daniel P. W. Ellis,et al.  Cover song detection: From high scores to general classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  D. Oller,et al.  Innateness, Experience, and Music Perception , 1990 .

[20]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[21]  Logan Volkers,et al.  PHASE VOCODER , 2008 .

[22]  Rainer Typke,et al.  Music Retrieval based on Melodic Similarity , 2007 .

[23]  William P. Birmingham,et al.  Query by Humming: How good can it get? , 2003, SIGIR 2003.

[24]  D. Harwood Universals in Music: A Perspective from Cognitive Psychology , 1976 .

[25]  Emilia Gómez,et al.  Automatic Extraction of Musical Structure Using Pitch Class Distribution Features , 2006 .

[26]  Rainer Typke,et al.  A Tunneling-Vantage Indexing Method for Non-Metrics , 2008, ISMIR.

[27]  D. Schwartz Encyclopedia of Knowledge Management , 2005 .

[28]  Andreas Nürnberger,et al.  Towards Query by Singing/Humming on Audio Databases , 2007, ISMIR.

[29]  Emilia Gómez,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Lothar Klein,et al.  Tonality , 1969 .

[31]  Ning Hu,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[32]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[33]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[34]  S. Jeong Harmony , 2012, SIGGRAPH '12.

[35]  Alan Hanjalic,et al.  Cover Song Retrieval: A Comparative Study of System Component Choices , 2009, ISMIR.

[36]  Ge Wang,et al.  Musical Influence Network Analysis and Rank of Sample-Based Music , 2011, ISMIR.

[37]  George Tzanetakis,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007 .

[38]  GomezEmilia,et al.  Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012 .

[39]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[40]  Emilia Gómez,et al.  Tonal Description of Polyphonic Audio for Music Content Processing , 2006, INFORMS J. Comput..

[41]  Kyoungro Yoon,et al.  Mid-Level Music Melody Representation of Polyphonic Audio for Query-by-Humming System , 2002, ISMIR.

[42]  Emilia Gómez,et al.  Supplementary Graphs: Sinusoid Extraction and Salience Function Design for Predominant Melody Estimation , 2011 .

[43]  Xavier Serra,et al.  Predictability of Music Descriptor Time Series and its Application to Cover Song Detection , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Anssi Klapuri,et al.  Query by humming of midi and audio using locality sensitive hashing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[46]  Raúl Coré,et al.  Grove Music Online , 2006 .

[47]  Justin Salamon,et al.  A Quantitative Evaluation of a Two Stage Retrieval Approach for a Melodic Query by Example System , 2009, ISMIR.

[48]  Surithong Srisa‐ard Understanding Information Systems: What They Do and Why We Need Them , 2005 .

[49]  R. Andrzejak,et al.  Cross recurrence quantification for cover song identification , 2009 .

[50]  Pierre Hanna,et al.  Query by tapping system based on alignment algorithm , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[52]  Rui Jiang,et al.  The vocalsearch music search engine , 2008, JCDL '08.

[53]  Mathieu Lagrange,et al.  Multimodal similarity between musical streams for cover version detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.