Combining Evaluation Metrics with a Unanimous Improvement Ratio and its Application to the Web People Search Clustering Task

This paper presents the Unanimous Improvement Ratio (UIR), a measure that allows to compare systems using two evaluation metrics without dependencies on relative metric weights. For clustering tasks, this kind of measure becomes necessary given the trade-o between precision and recall oriented metrics (e.g. Purity and Inverse Purity) which usually depends on a clustering threshold parameter stated in the algorithm. Our empirical results show that (1) UIR rewards system improvements that are robusts regarding weighting schemes in evaluation metrics, (2) UIR reects improvement ranges and (3) although it is a non parametric measure, it is sensitive enough for detecting most robust system improvements. The application of UIR to the second Web People Search evaluation campaign (WePS-2) shows that UIR is able to complement successfully the results oered by a conventional metric combination approach (such as Van Rijsbergen’s F measure).