Counterfactual Evaluation of Peer-Review Assignment Policies

Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment policies is evaluating how changes to the assignment algorithm map to changes in review quality. In this work, we leverage recently proposed policies that introduce randomness in peer-review assignment--in order to mitigate fraud--as a valuable opportunity to evaluate counterfactual assignment policies. Specifically, we exploit how such randomized assignments provide a positive probability of observing the reviews of many assignment policies of interest. To address challenges in applying standard off-policy evaluation methods, such as violations of positivity, we introduce novel methods for partial identification based on monotonicity and Lipschitz smoothness assumptions for the mapping between reviewer-paper covariates and outcomes. We apply our methods to peer-review data from two computer science venues: the TPDP'21 workshop (95 papers and 35 reviewers) and the AAAI'22 conference (8,450 papers and 3,145 reviewers). We consider estimates of (i) the effect on review quality when changing weights in the assignment algorithm, e.g., weighting reviewers' bids vs. textual similarity (between the review's past papers and the submission), and (ii) the"cost of randomization", capturing the difference in expected quality between the perturbed and unperturbed optimal match. We find that placing higher weight on text similarity results in higher review quality and that introducing randomization in the reviewer-paper assignment only marginally reduces the review quality. Our methods for partial identification may be of independent interest, while our off-policy approach can likely find use evaluating a broad class of algorithmic matching systems.

[1]  Martin Saveski,et al.  Off-policy evaluation beyond overlap: partial identification through smoothness , 2023, 2305.11812.

[2]  Nihar B. Shah,et al.  A Gold Standard Dataset for the Reviewer Assignment Problem , 2023, ArXiv.

[3]  Nihar B. Shah Challenges, experiments, and computational solutions in peer review , 2022, Commun. ACM.

[4]  Anna Rogers,et al.  What Factors Should Paper-Reviewer Assignments Rely On? Community Perspectives on Issues and Ideals in Conference Peer-Review , 2022, NAACL.

[5]  Mausam,et al.  Matching Papers and Reviewers at Large Conferences , 2022, ArXiv.

[6]  Cameron Bruggeman,et al.  The Incentives Platform at Lyft , 2022, WSDM.

[7]  Yair Zick,et al.  I Will Have Order! Optimizing Orders for Fair Reviewer Assignment , 2021, AAMAS.

[8]  Michael L. Littman,et al.  Collusion rings threaten the integrity of computer science research , 2021, Commun. ACM.

[9]  Mourad Khayati,et al.  Peer Grading the Peer Reviews: A Dual-Role Approach for Lightening the Scholarly Paper Review Process , 2021, WWW.

[10]  Nihar B. Shah,et al.  A large scale randomized controlled trial on herding in peer-review discussions , 2020, PloS one.

[11]  Nihar B. Shah,et al.  A Novice-Reviewer Experiment to Address Scarcity of Qualified Reviewers in Large Conferences , 2020, AAAI.

[12]  Nihar B. Shah,et al.  Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments , 2020, NeurIPS.

[13]  Thorsten Joachims,et al.  Off-policy Bandits with Deficient Support , 2020, KDD.

[14]  Daniel S. Weld,et al.  SPECTER: Document-level Representation Learning using Citation-informed Transformers , 2020, ACL.

[15]  Dominic Bordelon,et al.  Distributed peer review enhanced with natural language processing and machine learning , 2020, Nature Astronomy.

[16]  A. McCallum,et al.  Paper Matching with Local Fairness Constraints , 2019, KDD.

[17]  Nihar B. Shah,et al.  PeerReview4All: Fair and Accurate Reviewer Assignment in Peer Review , 2018, ALT.

[18]  Nihar B. Shah,et al.  On Strategyproof Conference Peer Review , 2018, IJCAI.

[19]  Thomas Nedelec,et al.  Offline A/B Testing for Recommender Systems , 2018, WSDM.

[20]  Aleksandrs Slivkins,et al.  Harvesting Randomness to Optimize Distributed Systems , 2017, HotNets.

[21]  Min Zhang,et al.  Reviewer bias in single- versus double-blind peer review , 2017, Proceedings of the National Academy of Sciences.

[22]  Isabelle Guyon,et al.  Design and Analysis of the NIPS 2016 Review Process , 2017, J. Mach. Learn. Res..

[23]  Edoardo M. Airoldi,et al.  Detecting Network Effects: Randomizing Over Randomized Experiments , 2017, KDD.

[24]  Guillaume Cabanac,et al.  Expert suggestion for conference program committees , 2017, 2017 11th International Conference on Research Challenges in Information Science (RCIS).

[25]  Toby Walsh,et al.  The Conference Paper Assignment Problem: Using Order Weighted Averages to Assign Indivisible Goods , 2017, AAAI.

[26]  Jean Pouget-Abadie,et al.  Testing for arbitrary interference on experimentation platforms , 2017, Biometrika.

[27]  Yiwei Thomas Hou,et al.  The new automated IEEE INFOCOM review assignment system , 2016, IEEE Network.

[28]  Thorsten Joachims,et al.  Recommendations as Treatments: Debiasing Learning and Evaluation , 2016, ICML.

[29]  G. Imbens,et al.  Exact p-Values for Network Interference , 2015, 1506.02084.

[30]  Nasir D. Memon,et al.  A robust model for paper reviewer assignment , 2014, RecSys '14.

[31]  Cheng Long,et al.  On Good and Fair Paper-Reviewer Assignment , 2013, 2013 IEEE 13th International Conference on Data Mining.

[32]  Jon M. Kleinberg,et al.  Graph cluster randomization: network exposure to multiple universes , 2013, KDD.

[33]  Richard S. Zemel,et al.  The Toronto Paper Matching System: An automated paper-reviewer assignment system , 2013 .

[34]  Joaquin Quiñonero Candela,et al.  Counterfactual reasoning and learning systems: the example of computational advertising , 2012, J. Mach. Learn. Res..

[35]  Justine S. Hastings,et al.  School Choice, School Quality and Postsecondary Attainment , 2011, The American economic review.

[36]  Parag A. Pathak,et al.  Explaining Charter School Effectiveness , 2011, SSRN Electronic Journal.

[37]  Craig Boutilier,et al.  A Framework for Optimizing Paper Matching , 2011, UAI.

[38]  Jie Tang,et al.  Expertise Matching via Constraint-Based Optimization , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[39]  Peter A. Flach,et al.  Novel tools to streamline the conference review process: experiences from SIGKDD'09 , 2010, SKDD.

[40]  University of California Irvine,et al.  Telescope time without tears: a distributed approach to peer review , 2009, 0906.1943.

[41]  Judy Goldsmith,et al.  The AI conference paper assignment problem , 2007, AAAI 2007.

[42]  Andrew McCallum,et al.  Expertise modeling for matching papers with reviewers , 2007, KDD '07.

[43]  V. Demicheli,et al.  Peer review for improving the quality of grant applications. , 2007, The Cochrane database of systematic reviews.

[44]  Johan Bollen,et al.  An algorithm to determine peer-reviewers , 2006, CIKM '08.

[45]  Johan Bollen,et al.  Mapping the Bid Behavior of Conference Referees , 2006, J. Informetrics.

[46]  Charles F. Manski,et al.  Confidence Intervals for Partially Identified Parameters , 2003 .

[47]  J. McCullough,et al.  First Comprehensive Survey of NSF Applicants Focuses on Their Concerns About Proposal Review , 1989 .

[48]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[49]  H. D. Brunk,et al.  The Isotonic Regression Problem and its Dual , 1972 .

[50]  V. M. Joshi,et al.  ADMISSIBILITY AND BAYES ESTIMATION IN SAMPLING FINITE POPULATIONS. II , 1965 .

[51]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[52]  Pravesh Kothari,et al.  The Price of Strategyproofing Peer Assessment , 2022, ArXiv.

[53]  Peter A. Flach Computational Support for Academic Peer Review: A Perspective from Artificial Intelligence , 2016 .

[54]  Anssi Auvinen,et al.  Panel discussion does not improve reliability of peer review for medical research grant proposals. , 2012, Journal of clinical epidemiology.

[55]  Camillo J. Taylor,et al.  On the Optimal Assignment of Conference Papers to Reviewers , 2008 .

[56]  C. Manski Nonparametric Bounds on Treatment Effects , 1989 .

[57]  David R. Cox Planning of Experiments , 1958 .