A Simple Method to Determine if a Music Information Retrieval System is a “Horse”

We propose and demonstrate a simple method to explain the figure of merit (FoM) of a music information retrieval (MIR) system evaluated in a dataset, specifically, whether the FoM comes from the system using characteristics confounded with the “ground truth” of the dataset. Akin to the controlled experiments designed to test the supposed mathematical ability of the famous horse “Clever Hans,” we perform two experiments to show how three state-of-the-art MIR systems produce excellent FoM in spite of not using musical knowledge. This provides avenues for improving MIR systems, as well as their evaluation. We make available a reproducible research package so that others can apply the same method to evaluating other MIR systems.

[1]  A. Neuringer,et al.  Music discriminations by pigeons. , 1984 .

[2]  David J. Hand,et al.  Deconstructing Statistical Questions , 1994 .

[3]  R. Hamming You Get What You Measure , 1997 .

[4]  S. Watanabe,et al.  Reinforcing property of music in Java sparrows (Padda oryzivora) , 1998, Behavioural Processes.

[5]  A. R. Chase,et al.  Music discriminations by carp (Cyprinus carpio) , 2001 .

[6]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[7]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[8]  Simon Dixon,et al.  Dance music classification: A tempo-based approach , 2004, ISMIR.

[9]  François Pachet,et al.  Improving Timbre Similarity : How high’s the sky ? , 2004 .

[10]  Gerhard Widmer,et al.  Evaluating Rhythmic descriptors for Musical Genre Classification , 2004 .

[11]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.

[12]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[13]  Geraint A. Wiggins,et al.  How Many Beans Make Five? The Consensus Problem in Music-Genre Classification and a New Evaluation Method for Single-Genre Categorisation Systems , 2007, ISMIR.

[14]  H. Yoshida Tokyo, Japan , 2019, The Statesman’s Yearbook Companion.

[15]  Petri Toiviainen,et al.  MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio , 2007, ISMIR.

[16]  O. Lartillot,et al.  A MATLAB TOOLBOX FOR MUSICAL FEATURE EXTRACTION FROM AUDIO , 2007 .

[17]  François Pachet,et al.  Signal + Context = Better Classification , 2007, ISMIR.

[18]  Arthur Flexer,et al.  A Closer Look on Artist Filters for Musical Genre Classification , 2007, ISMIR.

[19]  Alastair J. D. Craft The role of culture in music information retrieval : a model of negotiated musical meaning, and its implications in methodology and evaluation of the music genre classification task , 2008 .

[20]  Mark Sandler,et al.  Learning Latent Semantic Models for Music from Social Tags , 2008 .

[21]  Thierry Bertin-Mahieux,et al.  Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases , 2008 .

[22]  R. A. Bailey,et al.  Design of comparative experiments , 2008 .

[23]  Gaël Richard,et al.  Temporal Integration for Audio Classification With Application to Musical Instrument Classification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Geraint A. Wiggins Semantic Gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music , 2009, 2009 11th IEEE International Symposium on Multimedia.

[25]  Jeffrey J. Scott,et al.  State of the Art Report: Music Emotion Recognition: A State of the Art Review , 2010, ISMIR.

[26]  Marcos Aurélio Domingues,et al.  Three Current Issues In Music Autotagging , 2011, ISMIR.

[27]  Fabien Gouyon,et al.  Short-term Feature Space and Music Genre Classification , 2011 .

[28]  Thierry Bertin-Mahieux,et al.  Automatic Tagging of Audio: The State-of-the-Art , 2011 .

[29]  Douglas Eck,et al.  The need for music information retrieval with user-centered and multimodal strategies , 2011, MIRUM '11.

[30]  Zhouyu Fu,et al.  A Survey of Audio-Based Music Classification and Annotation , 2011, IEEE Transactions on Multimedia.

[31]  Yading Song,et al.  Evaluation of Musical Features for Emotion Classification , 2012, ISMIR.

[32]  C. Lesimple,et al.  Do Horses Expect Humans to Solve Their Problems? , 2012, Front. Psychology.

[33]  Bob L. Sturm A Survey of Evaluation in Music Genre Recognition , 2012, Adaptive Multimedia Retrieval.

[34]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[35]  Jyh-Shing Roger Jang,et al.  Discovering Time-Constrained Sequential Patterns for Music Genre Classification , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Manuela M. Veloso,et al.  Autonomous robot dancing driven by beats and emotions of music , 2012, AAMAS.

[37]  Tomoko Matsui,et al.  Music genre classification using self-taught learning via sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Bob L. Sturm Classification accuracy is not enough , 2013, Journal of Intelligent Information Systems.

[39]  Julián Urbano Merino,et al.  Evaluation in audio music similarity , 2013 .

[40]  Bob L. Sturm Evaluating music emotion recognition: Lessons from music genre recognition? , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[41]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[42]  Yann LeCun,et al.  Feature learning and deep architectures: new directions for music informatics , 2013, Journal of Intelligent Information Systems.

[43]  Xavier Serra,et al.  Roadmap for Music Information ReSearch , 2013 .

[44]  Bob L. Sturm Making Explicit the Formalism Underlying Evaluation in Music Information Retrieval Research: A Look at the MIREX Automatic Mood Classification Task , 2013, CMMR.

[45]  Markus Schedl,et al.  The neglected user in music information retrieval research , 2013, Journal of Intelligent Information Systems.

[46]  Edward R. Dougherty,et al.  Scientific knowledge is possible with small-sample classification , 2013, EURASIP J. Bioinform. Syst. Biol..

[47]  Bob L. Sturm The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use , 2013, ArXiv.

[48]  R. Paiva,et al.  Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis , 2013 .

[49]  Xavier Serra,et al.  Evaluation in Music Information Retrieval , 2013, Journal of Intelligent Information Systems.

[50]  Bob L. Sturm The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval , 2013, ArXiv.

[51]  Bob L. Sturm,et al.  A closer look at deep learning neural networks with low-level spectral periodicity features , 2014, 2014 4th International Workshop on Cognitive Information Processing (CIP).