A Plan for Sustainable MIR Evaluation

The Music Information Retrieval Evaluation eXchange (MIREX) is a valuable community service, having established standard datasets, metrics, baselines, methodologies, and infrastructure for comparing MIR methods. While MIREX has managed to successfully maintain operations for over a decade, its long-term sustainability is at risk. The imposed constraint that input data cannot be made freely available to participants necessitates that all algorithms run on centralized computational resources, which are administered by a limited number of people. This incurs an approximately linear cost with the number of submissions, exacting significant tolls on both human and financial resources, such that the current paradigm becomes less tenable as participation increases. To alleviate the recurring costs of future evaluation campaigns, we propose a distributed, community-centric paradigm for system evaluation, built upon the principles of openness, transparency, reproducibility, and incremental evaluation. We argue that this proposal has the potential to reduce operating costs to sustainable levels. Moreover, the proposed paradigm would improve scalability, and eventually result in the release of large, open datasets for improving both MIR techniques and evaluation methods.

[1]  Oriol Nieto,et al.  JAMS: A JSON Annotated Music Specification for Reproducible MIR Research , 2014, ISMIR.

[2]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[3]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[4]  Riccardo Miotto,et al.  MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval , 2011, ISMIR.

[5]  References , 1971 .

[6]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[7]  Karën Fort,et al.  Towards a (Better) Definition of the Description of Annotated MIR Corpora , 2012, ISMIR.

[8]  Bob L. Sturm The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval , 2013, ArXiv.

[9]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[10]  Ben Carterette,et al.  Robust test collections for retrieval evaluation , 2007, SIGIR.

[11]  Xavier Serra,et al.  Roadmap for Music Information ReSearch , 2013 .

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Ichiro Fujinaga,et al.  An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis , 2011, ISMIR.

[14]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[15]  James Allan,et al.  Incremental test collections , 2005, CIKM '05.

[16]  Oriol Nieto Discovering Structure in Music: Automatic Approaches and Perceptual Evaluations , 2015 .

[17]  Markus Schedl,et al.  Minimal test collections for low-cost evaluation of Audio Music Similarity and Retrieval systems , 2012, International Journal of Multimedia Information Retrieval.

[18]  Feng Niu,et al.  Million Song Dataset Challenge ! , 2012 .

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[21]  Juan Pablo Bello,et al.  Four Timely Insights on Automatic Chord Estimation , 2015, ISMIR.

[22]  Matthias Mauch,et al.  MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research , 2014, ISMIR.

[23]  Mark B. Sandler,et al.  A Web of Musical Information , 2008, ISMIR.

[24]  Xavier Serra,et al.  Evaluation in Music Information Retrieval , 2013, Journal of Intelligent Information Systems.

[25]  Julián Urbano,et al.  Notes from the ISMIR 2012 late-breaking session on evaluation in music information retrieval , 2012, ISMIR 2012.

[26]  Mert Bay,et al.  The Music Information Retrieval Evaluation eXchange: Some Observations and Insights , 2010, Advances in Music Information Retrieval.

[27]  Matthew E. P. Davies,et al.  Selective Sampling for Beat Tracking Evaluation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.