Identifying Cover Songs Using Information-Theoretic Measures of Similarity

This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.

[1]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[2]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[3]  F. Takens Detecting strange attractors in turbulence , 1981 .

[4]  R. Jackendoff,et al.  A Generative Theory of Tonal Music , 1985 .

[5]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[6]  Eugene Narmour,et al.  The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model , 1990 .

[7]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[8]  Neri Merhav,et al.  A measure of relative entropy between individual sequences with application to universal classification , 1993, IEEE Trans. Inf. Theory.

[9]  N. Merhav,et al.  Relations Between Entropy and Error Probability , 1993, Proceedings. IEEE International Symposium on Information Theory.

[10]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[11]  Neri Merhav,et al.  Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.

[12]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[13]  H. Hirsh,et al.  DNA Sequence Classification Using Compression-Based Induction , 1995 .

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[17]  Jonathan Foote,et al.  ARTHUR: Retrieving Orchestral Music by Long-Term Structure , 2000, ISMIR.

[18]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[19]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[20]  Vittorio Loreto,et al.  Language trees and zipping. , 2002, Physical review letters.

[21]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[22]  M. Li,et al.  Melody Classification using a Similarity Metric based on Kolmogorov Complexity , 2004 .

[23]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[24]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[25]  A. Kaitchenko Algorithms for estimating information distance with application to bioinformatics and linguistics , 2004 .

[26]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..

[27]  Alexei Kaltchenko,et al.  Algorithms for estimating information distance with application to bioinformatics and linguistics , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[28]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[29]  Ronald de Wolf,et al.  Algorithmic Clustering of Music Based on String Compression , 2004, Computer Music Journal.

[30]  Paul M. B. Vitányi,et al.  Shannon Information and Kolmogorov Complexity , 2004, ArXiv.

[31]  Hsin-Min Wang,et al.  Query-By-Example Technique for Retrieving Cover Versions of Popular Songs with Similar Melodies , 2005, ISMIR.

[32]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[33]  Ming Li,et al.  Genre Classification via an LZ78-Based String Kernel , 2005, ISMIR.

[34]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[35]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[36]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[37]  Kyogu Lee,et al.  Identifying Cover Songs from Audio Using Harmonic Representation , 2006 .

[38]  Justyna Humięcka-Jakubowska,et al.  Sweet Anticipation : Music and , 2006 .

[39]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[40]  Carla E. Brodley,et al.  Compression and machine learning: a new perspective on feature space vectors , 2006, Data Compression Conference (DCC'06).

[41]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[42]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[43]  Michael A. Casey,et al.  The Importance of Sequences in Musical Similarity , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[44]  András Kocsor,et al.  Sequence analysis Application of compression-based distance measures to protein sequence classification : a methodological study , 2005 .

[45]  Emilia Gómez,et al.  The song remains the same: identifying versions of the same piece using tonal descriptors , 2006, ISMIR.

[46]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[47]  D. Ellis Beat Tracking by Dynamic Programming , 2007 .

[48]  Stephanie Wehner,et al.  Analyzing worms and network traffic using compression , 2005, J. Comput. Secur..

[49]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[50]  Juan Pablo Bello,et al.  Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats , 2007, ISMIR.

[51]  Søren Holdt Jensen,et al.  A Chroma-based Tempo-insensitive Distance Measure for Cover Song Identification , 2007 .

[52]  Haim Kaplan,et al.  Most Burrows-Wheeler Based Compressors Are Not Optimal , 2007, CPM.

[53]  Tuomas Virtanen,et al.  A SIMILARITY MEASURE FOR AUDIO QUERY BY EXAMPLE BASED ON PERCEPTUAL CODING AND COMPRESSION , 2007 .

[54]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[55]  Michael A. Casey,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors , 2008 .

[56]  M. Slaney,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[57]  Malcolm Slaney,et al.  Analysis of Minimum Distances in High-Dimensional Musical Spaces , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[59]  R. Andrzejak,et al.  Cross recurrence quantification for cover song identification , 2009 .

[60]  Gerhard Widmer,et al.  A Filter-and-Refine Indexing Method for Fast Similarity Search in Millions of Music Tracks , 2009, ISMIR.

[61]  Mark D. Plumbley,et al.  Information dynamics: patterns of expectation and surprise in the perception of music , 2009, Connect. Sci..

[62]  Mateu Sbert,et al.  Image registration by compression , 2010, Inf. Sci..

[63]  Teppo E. Ahonen Combining Chroma Features For Cover Version Identification , 2010, ISMIR.

[64]  Daniel P. W. Ellis,et al.  Cover song detection: From high scores to general classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[65]  Anssi Klapuri,et al.  State of the Art Report: Audio-Based Music Structure Analysis , 2010, ISMIR.

[66]  Tuomas Virtanen,et al.  Audio Query by Example Using Similarity Measures between Probability Density Functions of Features , 2010, EURASIP J. Audio Speech Music. Process..

[67]  Hiromasa Fujihara,et al.  A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  Thierry Bertin-Mahieux,et al.  Large-scale cover song recognition using hashed chroma landmarks , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[69]  Zhouyu Fu,et al.  Music classification via the bag-of-features approach , 2011, Pattern Recognit. Lett..

[70]  Juan Pablo Bello,et al.  Measuring Structural Similarity in Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[71]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[72]  J. Neher A problem of multiple comparisons , 2011 .

[73]  Mark D. Plumbley,et al.  Causal Prediction of Continuous-Valued Music Features , 2011, ISMIR.

[74]  Joan Serrà,et al.  Identification of versions of the same musical composition by processing audio descriptions , 2011 .

[75]  Xavier Serra,et al.  Predictability of Music Descriptor Time Series and its Application to Cover Song Detection , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[76]  Teppo E. Ahonen Compression-Based Clustering of Chromagram Data: New Method and Representations , 2012 .

[77]  Geraint A. Wiggins,et al.  Auditory Expectation: The Information Dynamics of Music Perception and Cognition , 2012, Top. Cogn. Sci..

[78]  Daniel G. Brown,et al.  BLAST for Audio Sequences Alignment: A Fast Scalable Cover Identification Tool , 2012, ISMIR.

[79]  Thierry Bertin-Mahieux,et al.  Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude , 2012, ISMIR.

[80]  Jaakko Astola,et al.  Information theoretic methods for aligning audio signals using chromagram representations , 2012, 2012 5th International Symposium on Communications, Control and Signal Processing.

[81]  Marc Van Droogenbroeck,et al.  Efficient database pruning for large-scale cover song recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[82]  Daniel P. W. Ellis,et al.  A Video Compression-Based Approach to Measure Music Structural Similarity , 2013, ISMIR.

[83]  J. Sloboda,et al.  Music and Emotion , 2013 .

[84]  Maurizio Omologo,et al.  Large-Scale Cover Song Identification Using Chord Profiles , 2013, ISMIR.

[85]  Anssi Klapuri,et al.  Identification of cover songs using information theoretic measures of similarity , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[86]  Anna-Karin Weivert,et al.  Music and Emotion , 2022 .