Investigating Comparative Evaluation for Large Data

Evaluation is ubiquitous. Often we need to evaluate a set of target entities and obtain their true ratings (average ratings from the population) or true rankings (rankings derived from true ratings). Based on the law of large numbers, average ratings from large samples can provide a good approximation. However, due to the fact that evaluation is labor-intensive, in practice large evaluation data are typically very sparse where each entity receives very few ratings. Consequently, average ratings would significantly differ from true ratings due to biased distribution of standards and preferences of evaluators. In this paper, we investigate comparative evaluation that addresses the evaluation bias problem for improved evaluation accuracy. The principle idea is to first extract a partial list for the entities evaluated by each evaluator, and then aggregate all the partial lists to obtain a total list that well approximates the true rankings. The aggregated total list can be used to further estimate the true ratings. In this paper we also study an associated problem of evaluation assignment (assigning target entities to evaluators), where we propose an iterative assignment approach to maximize accuracy of comparative evaluation given limited evaluation resources.