Multimodal similarity between musical streams for cover version detection

Expressing the similarity between musical streams is a challenging task as it involves the understanding of many factors which are most often blended into one information channel: the audio stream. Consequently, separating the musical audio stream into its main melody and its accompaniment may prove as being useful to root the similarity computation on a more robust and expressive representation. In this paper, we show that considering the mixture, an estimation of its main melody and its accompaniment as modalities allows us to propose new ways of defining the similarity between musical streams. In the context of the detection of cover version, we show that highest performance is achieved by jointly considering the mixture and the estimated accompaniment. As demonstrated by the experiments carried out using two different evaluation databases, this scheme allows the scoring system to focus more on the chord progression by considering the accompaniment while being robust to the potential separation errors by also considering the mixture.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Matija Marolt,et al.  A Mid-level Melody-based Representation for Calculating Audio Similarity , 2006, ISMIR.

[3]  Gaël Richard,et al.  An iterative approach to monaural musical mixture de-soloing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Joan Serrà A Qualitative Assessment of Measures for the Evaluation of a Cover Song Identification System , 2007, ISMIR.

[5]  Remco C. Veltkamp,et al.  Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[6]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[7]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Hsin-Min Wang,et al.  Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies , 2005, ISMIR.

[9]  Ning Hu,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007, J. Assoc. Inf. Sci. Technol..

[10]  Emilia Gómez,et al.  Automatic Tonal Analysis from Music Summaries for Version Identification , 2006 .

[11]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[12]  Daniel P. W. Ellis,et al.  The 2007 LabROSA Cover Song Detection System , 2007 .

[13]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.