Applying Machine Learning Diversity Metrics to Data Fusion in Information Retrieval

The Supervised Machine Learning task of classification has parallels with Information Retrieval (IR): in each case, items (documents in the case of IR) are required to be categorised into discrete classes (relevant or non-relevant). Thus a parallel can also be drawn between classifier ensembles, where evidence from multiple classifiers are combined to achieve a superior result, and the IR data fusion task. This paper presents preliminary experimental results on the applicability of classifier ensemble diversity metrics in data fusion. Initial results indicate a relationship between the quality of the fused result set (as measured by MAP) and the diversity of its inputs