For the RISM A/II collection of musical incipit (short extracts of scores, taken from the beginning), we have established a ground truth based on the opinions of human experts. It contains correctly ranked matches for a set of given queries. These ranked lists contain groups of documents whose ranks were not significantly different. In other words, they are only partially ordered. To make use of the available information for measuring the quality of retrieval results, we introduce the "average dynamic recall" (ADR) that averages the recall among a dynamic set of relevant documents, taking into account the fact that the ground truth reliably orders groups of matches, but not always individual matches. Dynamic recall measures how many of the documents that should have appeared before or at a given position in the result list actually have appeared. ADR at a given position averages this measure up to the given position. Our measure was first used at the MIREX 2005 Symbolic Melodic Similarity contest
[1]
Jaana Kekäläinen,et al.
Cumulated gain-based evaluation of IR techniques
,
2002,
TOIS.
[2]
Remco C. Veltkamp,et al.
Searching notated polyphonic music using transportation distances
,
2004,
MULTIMEDIA '04.
[3]
W. S. Cooper.
Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems
,
1968
.
[4]
Michael McGill,et al.
Introduction to Modern Information Retrieval
,
1983
.
[5]
Remco C. Veltkamp,et al.
A Ground Truth For Half A Million Musical Incipits
,
2005,
J. Digit. Inf. Manag..
[6]
Robert M. Losee.
Text retrieval and filtering: analytic models of performance
,
1998
.
[7]
Jaana Kekäläinen,et al.
Using graded relevance assessments in IR evaluation
,
2002,
J. Assoc. Inf. Sci. Technol..