Predictability of Music Descriptor Time Series and its Application to Cover Song Detection

Intuitively, music has both predictable and unpredictable components. In this paper, we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e., different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.

[1]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[2]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[3]  Earl Vickers Automatic Long-term Loudness and Dynamics Matching , 2001 .

[4]  N. Kampen,et al.  Stochastic processes in physics and chemistry , 1981 .

[5]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[6]  Fabian Mörchen,et al.  Modeling timbre distance with temporal statistics from polyphonic music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[8]  Guillermo José Lorenzo González The singing Neanderthals. The origins of music, language, mind and body, de Steven Mithen , 2007 .

[9]  Peter Knees,et al.  On Rhythm and General Music Similarity , 2009, ISMIR.

[10]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[11]  G. C. Tiao,et al.  An introduction to multiple time series analysis. , 1993, Medical care.

[12]  Box Ge,et al.  Time series analysis: forecasting and control rev. ed. , 1976 .

[13]  Xavier Serra,et al.  What/when causal expectation modelling applied to audio signals , 2009, Connect. Sci..

[14]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008 .

[15]  Eugene Narmour,et al.  The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model , 1990 .

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  H. Tong,et al.  Threshold Autoregression, Limit Cycles and Cyclical Data , 1980 .

[18]  Matija Marolt,et al.  A Mid-Level Representation for Melody-Based Retrieval in Audio Collections , 2008, IEEE Transactions on Multimedia.

[19]  Mark D. Plumbley,et al.  Information dynamics: patterns of expectation and surprise in the perception of music , 2009, Connect. Sci..

[20]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[21]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[22]  Peter Desain,et al.  The Formation of Rhythmic Categories and Metric Priming , 2003, Perception.

[23]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[24]  Sergi Jordà,et al.  Improvising With Computers: A Personal Survey (1989–2001) , 2002, ICMC.

[25]  William H. Press,et al.  Numerical recipes , 1990 .

[26]  C. Harte,et al.  Detecting harmonic change in musical audio , 2006, AMCMM '06.

[27]  T. Schreiber,et al.  Surrogate time series , 1999, chao-dyn/9909037.

[28]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[29]  Emilia Gómez Gutiérrez,et al.  Tonal description of music audio signals , 2006 .

[30]  Farmer,et al.  Predicting chaotic time series. , 1987, Physical review letters.

[31]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[32]  W. Ebeling Stochastic Processes in Physics and Chemistry , 1995 .

[33]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[34]  Emilia Gómez,et al.  Transposing Chroma Representations to a Common Key , 2008 .

[35]  C. Stevens,et al.  Sweet Anticipation: Music and the Psychology of Expectation, by David Huron . Cambridge, Massachusetts: MIT Press, 2006 , 2007 .

[36]  Nicola Orio,et al.  Music Retrieval: A Tutorial and Review , 2006, Found. Trends Inf. Retr..

[37]  Emilia Gómez,et al.  Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation, and Beyond , 2010, Advances in Music Information Retrieval.

[38]  Joan Serrà,et al.  Shape-based spectral contrast descriptor , 2009 .

[39]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[40]  Paul M. Brossier,et al.  Automatic annotation of musical audio for interactive applications , 2006 .

[41]  Lars Kai Hansen,et al.  Temporal Feature Integration for Music Genre Classification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Daniel P. W. Ellis,et al.  Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[43]  François Pachet,et al.  The Continuator: Musical Interaction With Style , 2003, ICMC.

[44]  R. Andrzejak,et al.  Cross recurrence quantification for cover song identification , 2009 .

[45]  Perfecto Herrera,et al.  The rhythm transform: towards a Generic rhythm Description , 2005, ICMC.

[46]  Xavier Serra,et al.  Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[48]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[49]  F. Takens Detecting strange attractors in turbulence , 1981 .

[50]  Daniel J. Levitin,et al.  Why music moves us , 2010, Nature.

[51]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[52]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[53]  Jean-François Paiement,et al.  Predictive models for music , 2009, Connect. Sci..

[54]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[55]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[56]  Shlomo Dubnov,et al.  Audio Oracle: a New Algorithm for Fast Learning of audio Structures , 2007, ICMC.

[57]  Shlomo Dubnov,et al.  Spectral Anticipations , 2006, Computer Music Journal.

[58]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[59]  D. Sherrington Stochastic Processes in Physics and Chemistry , 1983 .

[60]  E. Batlle,et al.  Automatic Song Identification in Noisy Broadcast Audio , 2002 .

[61]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[62]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[63]  Katharina Morik,et al.  Automatic Feature Extraction for Classifying Audio Data , 2005, Machine Learning.

[64]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .