Assessing the Quality of Student-Generated Content at Scale: A Comparative Analysis of Peer-Review Models

Engaging students in creating learning resources has demonstrated pedagogical benefits. However, to effectively utilize a repository of student-generated content (SGC), a selection process is needed to separate high- from low-quality resources as some of the resources created by students can be ineffective, inappropriate, or incorrect. A common and scalable approach is to use a peer-review process where students are asked to assess the quality of resources authored by their peers. Given that judgments of students, as experts-in-training, cannot wholly be relied upon, a redundancy-based method is widely employed where the same assessment task is given to multiple students. However, this approach introduces a new challenge, referred to as the consensus problem: How can we assign a final quality to a resource given ratings by multiple students? To address this challenge, we investigate the predictive performance of 18 inference models across five well-established categories of consensus approaches for inferring the quality of SGC at scale. The analysis is based on the engagement of 2141 undergraduate students across five courses in creating 12 803 resources and 77 297 peer reviews. Results indicate that the quality of reviews is quite diverse, and students tend to overrate. Consequently, simple statistics such as mean and median fail to identify poor-quality resources. Findings further suggest that incorporating advanced probabilistic and text analysis methods to infer the reviewers' reliability and reviews' quality improves performance; however, there is still an evident need for instructor oversight and training of students to write compelling and reliable reviews.

[1]  D. Gašević,et al.  Incorporating Training, Self-monitoring and AI-Assistance to Improve Peer Feedback Quality , 2022, L@S.

[2]  D. Gašević,et al.  Incorporating AI and learning analytics to build trustworthy peer assessment systems , 2022, Br. J. Educ. Technol..

[3]  Roberto Martínez Maldonado,et al.  Explainable Artificial Intelligence in education , 2022, Comput. Educ. Artif. Intell..

[4]  S. Sadiq,et al.  Neurophysiological Measurements in Higher Education: A Systematic Literature Review , 2021, International Journal of Artificial Intelligence in Education.

[5]  Hassan Khosravi,et al.  Supporting peer evaluation of student-generated content: a study of three approaches , 2021, Assessment & Evaluation in Higher Education.

[6]  S. Sadiq,et al.  Employing Peer Review to Evaluate the Quality of Student Generated Content at Scale: A Trust Propagation Approach , 2021, L@S.

[7]  Shazia Wasim Sadiq,et al.  Charting the Design and Analytics Agenda of Learnersourcing Systems , 2021, LAK.

[8]  Barbara E. Hanna,et al.  The effects of rubrics on evaluative judgement: a randomised controlled experiment , 2021, Assessment & Evaluation in Higher Education.

[9]  Gianluca Demartini,et al.  Evaluating the Quality of Learning Resources: A Learnersourcing Approach , 2021, IEEE Transactions on Learning Technologies.

[10]  Shazia Wasim Sadiq,et al.  Open Learner Models for Multi-activity Educational Systems , 2021, AIED.

[11]  Shazia Wasim Sadiq,et al.  Utilising Learnersourcing to Inform Design Loop Adaptivity , 2020, EC-TEL.

[12]  Amal Zouaq,et al.  Learnersourcing Quality Assessment of Explanations for Peer Instruction , 2020, EC-TEL.

[13]  Philip J. Guo,et al.  Learnersourcing at Scale to Overcome Expert Blind Spots for Introductory Programming: A Three-Year Deployment Study on the Python Tutor Website , 2020, L@S.

[14]  David Carless,et al.  From teacher transmission of information to student feedback literacy: Activating the learner role in feedback processes , 2020, Active Learning in Higher Education.

[15]  W. Viechtbauer,et al.  Theories of the generation effect and the impact of generation constraint: A meta-analytic review , 2020, Psychonomic Bulletin & Review.

[16]  D. Gašević,et al.  A collaborative learning approach to dialogic peer feedback: a theoretical framework , 2020, Assessment & Evaluation in Higher Education.

[17]  Sally Hamouda,et al.  Mapping the Landscape of Peer Review in Computing Education Research , 2020, ITiCSE-WGR.

[18]  John C. Stamper,et al.  Evaluating Crowdsourcing and Topic Modeling in Generating Knowledge Components from Explanations , 2020, AIED.

[19]  Susan Bull,et al.  There are Open Learner Models About! , 2020, IEEE Transactions on Learning Technologies.

[20]  Don Hong,et al.  BERT Feature Based Model for Predicting the Helpfulness Scores of Online Customers Reviews , 2020 .

[21]  Dragan Gasevic,et al.  Development and Adoption of an Adaptive Learning System: Reflections and Lessons Learned , 2020, SIGCSE.

[22]  C. Mercader,et al.  Factors influencing students’ peer feedback uptake: instructional design matters , 2020 .

[23]  Hadeel S. Alenezi,et al.  Utilizing crowdsourcing and machine learning in education: Literature review , 2020, Education and Information Technologies.

[24]  Purnamrita Sarkar,et al.  On hyperparameter tuning in general clustering problems , 2020 .

[25]  Imad H. Elhajj,et al.  Modelling Cognitive Bias in Crowdsourcing Systems , 2019, Cognitive Systems Research.

[26]  Joseph Jay Williams,et al.  RiPPLE: A Crowdsourced Adaptive Platform for Recommendation of Learning Activities , 2019, J. Learn. Anal..

[27]  Satoshi Oyama,et al.  Bayesian probabilistic tensor factorization for recommendation and rating aggregation with multicriteria evaluation data , 2019, Expert Syst. Appl..

[28]  Klaus-Robert Müller,et al.  Towards Explainable Artificial Intelligence , 2019, Explainable AI.

[29]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[30]  Victor S. Sheng,et al.  Ensemble Learning from Crowds , 2019, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jacob Whitehill,et al.  Do Learners Know What's Good for Them? Crowdsourcing Subjective Ratings of OERs to Predict Learning Gains , 2019, EDM.

[32]  Carolyn Penstein Rosé,et al.  UpGrade: Sourcing Student Open-Ended Solutions to Create Scalable Learning Opportunities , 2019, L@S.

[33]  Miao Fan,et al.  Product-Aware Helpfulness Prediction of Online Reviews , 2019, WWW.

[34]  Gang Kou,et al.  A review on trust propagation and opinion dynamics in social networks and group decision making frameworks , 2019, Inf. Sci..

[35]  Benjamin I. P. Rubinstein,et al.  Truth Inference at Scale: A Bayesian Model for Adjudicating Highly Redundant Crowd Annotations , 2019, WWW.

[36]  Maurizio Naldi,et al.  A review of sentiment computation methods with R packages , 2019, ArXiv.

[37]  Dapeng Tao,et al.  Domain-Weighted Majority Voting for Crowdsourcing , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Tian Tian,et al.  Max-Margin Majority Voting for Learning from Crowds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  David Boud,et al.  Developing evaluative judgement: enabling students to make decisions about the quality of work , 2018 .

[41]  Rishi Desai,et al.  Crowdsourcing for assessment items to support adaptive learning , 2018, Medical teacher.

[42]  David Boud,et al.  The development of student feedback literacy: enabling uptake of feedback , 2018 .

[43]  Bo An,et al.  Optimal Spot-Checking for Improving Evaluation Accuracy of Peer Grading Systems , 2018, AAAI.

[44]  Helen Purchase,et al.  Peer-review in practice: eight years of Aropä , 2018 .

[45]  Paul Denny,et al.  Collaborative learning with PeerWise , 2018 .

[46]  Satoshi Oyama,et al.  Collaborative filtering and rating aggregation based on multicriteria rating , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[47]  Paul Denny,et al.  Formative student-authored question bank: perceptions, question quality and association with summative performance , 2017, Postgraduate Medical Journal.

[48]  Hassan Khosravi,et al.  RiPLE: Recommendation in Peer-Learning Environments Based on Knowledge Gaps and Interests , 2017, EDM.

[49]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[50]  John Sweller,et al.  Relations between the worked example and generation effects on immediate and delayed tests , 2016 .

[51]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[52]  David C. Parkes,et al.  Practical Peer Prediction for Peer Assessment , 2016, HCOMP.

[53]  Leah P. Macfadyen,et al.  Whose feedback? A multilevel analysis of student completion of end-of-term teaching evaluations , 2016 .

[54]  Rodney D. Myers,et al.  Instructional-Design Theories and Models, Volume IV : The Learner-Centered Paradigm of Education , 2016 .

[55]  Neil T. Heffernan,et al.  AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning , 2016, L@S.

[56]  Elena L. Glassman,et al.  Learnersourcing Personalized Hints , 2016, CSCW.

[57]  Neil T. Heffernan,et al.  The Future of Adaptive Learning: Does the Crowd Hold the Key? , 2016, International Journal of Artificial Intelligence in Education.

[58]  Bertrand K. Hassani The Consensus Approach , 2016 .

[59]  Gavin T. L. Brown,et al.  The Future of Student Self-Assessment: a Review of Known Unknowns and Potential Directions , 2016 .

[60]  Hamid Hassanpour,et al.  A Geometric View of Similarity Measures in Data Mining , 2015 .

[61]  Wolf-Tilo Balke,et al.  A Majority of Wrongs Doesn't Make It Right - On Crowdsourcing Quality for Skewed Domain Tasks , 2015, WISE.

[62]  Peng Dai,et al.  Got Many Labels?: Deriving Topic Labels from Multiple Sources for Social Media Posts using Crowdsourcing and Ensemble Learning , 2015, WWW.

[63]  Krzysztof Z. Gajos,et al.  Learnersourcing Subgoal Labels for How-to Videos , 2015, CSCW.

[64]  Kevin Leyton-Brown,et al.  Mechanical TA: Partially Automated High-Stakes Peer Grading , 2015, SIGCSE.

[65]  Simon Burns,et al.  Doing it for themselves: students creating a high quality peer-learning environment , 2015 .

[66]  Juho Kim,et al.  Improving learning with collective learner activity , 2015 .

[67]  Sanjay Krishnan,et al.  A methodology for learning, analyzing, and mitigating social influence bias in recommender systems , 2014, RecSys '14.

[68]  Luca de Alfaro,et al.  CrowdGrader: a tool for crowdsourcing the evaluation of homework assignments , 2014, SIGCSE.

[69]  Simon Bates,et al.  Assessing the quality of a student-generated question repository , 2013, 1308.2202.

[70]  R. Bjork,et al.  Self-regulated learning: beliefs, techniques, and illusions. , 2013, Annual review of psychology.

[71]  Matthew Lease,et al.  Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization , 2012, HCOMP@AAAI.

[72]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[73]  Mitchell J. Nathan,et al.  Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology , 2012 .

[74]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[75]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[76]  Young Hoan Cho,et al.  Peer reviewers learn from giving comments , 2011 .

[77]  Hamid Hassanpour,et al.  A regression-based approach for measuring similarity in discrete signals , 2011 .

[78]  Peter A. Flach,et al.  A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance , 2011, ICML.

[79]  Diane J. Litman,et al.  Understanding Differences in Perceived Peer-Review Helpfulness using Natural Language Processing , 2011, BEA@ACL.

[80]  Diane J. Litman,et al.  Automatically Predicting Peer-Review Helpfulness , 2011, ACL.

[81]  Kwangsu Cho,et al.  Learning by reviewing , 2011 .

[82]  A. Darvishi,et al.  Translation Invariant Approach for Measuring Similarity of Signals , 2010 .

[83]  M. Garner,et al.  Is the feedback in higher education assessment worth the paper it is written on? Teachers' reflections on their practices , 2010 .

[84]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[85]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[86]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[87]  Theodoros Damoulas,et al.  Pattern Recognition , 1998, Encyclopedia of Information Systems.

[88]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[89]  Beth Simon,et al.  Quality of student contributed questions using PeerWise , 2009, ACE '09.

[90]  Steve Joordens,et al.  Peering into large lectures: examining peer and expert mark agreement using peerScholar, an online peer assessment tool , 2008, J. Comput. Assist. Learn..

[91]  John Hamer,et al.  PeerWise: students sharing their multiple choice questions , 2008, ICER '08.

[92]  C. Reigeluth,et al.  The Learner-Centered Paradigm of Education , 2008 .

[93]  H. Marsh,et al.  Improving the Peer-review Process for Grant Applications , 2022 .

[94]  Ok-Choon Park,et al.  Adaptive Instructional Systems , 2007 .

[95]  Ming Zhou,et al.  Low-Quality Product Review Detection in Opinion Summarization , 2007, EMNLP.

[96]  Zhu Zhang,et al.  Utility scoring of product reviews , 2006, CIKM '06.

[97]  John A. Ross The Reliability, Validity, and Utility of Self-Assessment , 2006 .

[98]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[99]  D. Carless,et al.  Peer feedback: the learning element of peer assessment , 2006 .

[100]  Ramanathan V. Guha,et al.  Propagation of trust and distrust , 2004, WWW '04.