Optimal Spot-Checking for Improving Evaluation Accuracy of Peer Grading Systems

Peer grading, allowing students/peers to evaluate others’ assignments, offers a promising solution for scaling evaluation and learning to large-scale educational systems. A key challenge in peer grading is motivating peers to grade diligently. While existing spot-checking (SC) mechanisms can prevent peer collusion where peers coordinate to report the uninformative grade, they unrealistically assume that peers have the same grading reliability and cost. This paper studies the general Optimal Spot-Checking (OptSC) problem of determining the probability each assignment needs to be checked to maximize assignments’ evaluation accuracy aggregated from peers, and takes into consideration 1) peers’ heterogeneous characteristics, and 2) peers’ strategic grading behaviors to maximize their own utility. We prove that the bilevel OptSC is NP-hard to solve. By exploiting peers’ grading behaviors, we first formulate a single level relaxation to approximate OptSC. By further exploiting structural properties of the relaxed problem, we propose an efficient algorithm to that relaxation, which also gives a good approximation of the original OptSC. Extensive experiments on both synthetic and real datasets show significant advantages of the proposed algorithm over existing approaches.

[1]  Jinwoo Shin,et al.  Optimality of Belief Propagation for Crowdsourced Classification , 2016, ICML.

[2]  P. Sadler,et al.  The Impact of Self- and Peer-Grading on Student Learning , 2006 .

[3]  Arpit Agarwal,et al.  Informed Truthfulness in Multi-Task Peer Prediction , 2016, EC.

[4]  Carla P. Gomes,et al.  Avicaching: A Two Stage Game for Bias Reduction in Citizen Science , 2016, AAMAS.

[5]  Arpit Agarwal,et al.  Peer Prediction with Heterogeneous Users , 2017, EC.

[6]  Kevin Leyton-Brown,et al.  Incentivizing Evaluation via Limited Access to Ground Truth: Peer-Prediction Makes Things Worse , 2016, ArXiv.

[7]  David C. Parkes,et al.  A Robust Bayesian Truth Serum for Small Populations , 2012, AAAI.

[8]  Thorsten Joachims,et al.  Methods for ordinal peer grading , 2014, KDD.

[9]  D. Prelec A Bayesian Truth Serum for Subjective Data , 2004, Science.

[10]  Boi Faltings,et al.  Incentives for Subjective Evaluations with Private Beliefs , 2015, AAAI.

[11]  David C. Parkes,et al.  Practical Peer Prediction for Peer Assessment , 2016, HCOMP.

[12]  Boi Faltings,et al.  Enforcing Truthful Strategies in Incentive Compatible Reputation Mechanisms , 2005, WINE.

[13]  Luca de Alfaro,et al.  CrowdGrader: a tool for crowdsourcing the evaluation of homework assignments , 2014, SIGCSE.

[14]  Hongwei Li,et al.  Error Rate Analysis of Labeling by Crowdsourcing , 2013 .

[15]  Ioannis Caragiannis,et al.  Aggregating Partial Rankings with Applications to Peer Grading in Massive Online Open Courses , 2014, AAMAS.

[16]  Paul Resnick,et al.  Eliciting Informative Feedback: The Peer-Prediction Method , 2005, Manag. Sci..

[17]  Yang Liu,et al.  Learning to Incentivize: Eliciting Effort via Output Agreement , 2016, IJCAI.

[18]  Anirban Dasgupta,et al.  Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[19]  Chien-Ju Ho,et al.  Eliciting Categorical Data for Optimal Aggregation , 2016, NIPS.

[20]  David C. Parkes,et al.  Dwelling on the Negative: Incentivizing Effort in Peer Prediction , 2013, HCOMP.

[21]  Steve Joordens,et al.  Peering into large lectures: examining peer and expert mark agreement using peerScholar, an online peer assessment tool , 2008, J. Comput. Assist. Learn..

[22]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[23]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[24]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[25]  Kevin Leyton-Brown,et al.  Mechanical TA: Partially Automated High-Stakes Peer Grading , 2015, SIGCSE.

[26]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[27]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[28]  Nicholas R. Jennings,et al.  Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[29]  Grant Schoenebeck,et al.  Putting Peer Prediction Under the Micro(economic)scope and Making Truth-Telling Focal , 2016, WINE.

[30]  Yair Zick,et al.  Incentivizing Peer Grading in MOOCS: An Audit Game Approach , 2015, IJCAI.