Summarizing Review Scores of "Unequal" Reviewers

A frequently encountered problem in decision making is the following review problem: review a large number of objects and select a small number of the best ones. An example is selecting conference papers from a large number of submissions. This problem involves two sub-problems: assigning reviewers to each object, and summarizing reviewers’ scores into an overall score that supposedly reflects the quality of an object. In this paper, we address the score summarization sub-problem for the scenario where a small number of reviewers evaluate each object. Simply averaging the scores may not work as even a single reviewer could influence the average significantly. We recognize that reviewers are not necessarily on an equal ground and propose the notion of “leniency” to model this difference of reviewers. Two insights underpin our approach: (1) the “leniency” of a reviewer depends on how s/he evaluates objects as well as on how other reviewers evaluate the same set of objects, (2) the “leniency” of a reviewer and the “quality” of objects evaluated exhibit a mutual dependency relationship. These insights motivate us to develop a model that solves both “leniency” and “quality” simultaneously. We study the effectiveness of this model on a real-life dataset.