Reviewing and analyzing peer review Inter-Rater Reliability in a MOOC platform

Abstract Peer assessment activities might be one of the few personalized assessment alternatives to the implementation of auto-graded activities at scale in Massive Open Online Course (MOOC) environments. However, teacher's motivation to implement peer assessment activities in their courses might go beyond the most straightforward goal (i.e., assessment), as peer assessment activities also have other side benefits, such as showing evidence and enhancing the critical thinking, comprehension or writing capabilities of students. However, one of the main drawbacks of implementing peer review activities, especially when the scoring is meant to be used as part of the summative assessment, is that it adds a high degree of uncertainty to the grades. Motivated by this issue, this paper analyses the reliability of all the peer assessment activities performed as part of the MOOC platform of the Spanish University for Distance Education (UNED) UNED-COMA. The following study has analyzed 63 peer assessment activities from the different courses in the platform, and includes a total of 27,745 validated tasks and 93,334 peer reviews. Based on the Krippendorff's alpha statistic, which measures the agreement reached between the reviewers, the results obtained clearly point out the low reliability, and therefore, the low validity of this dataset of peer reviews. We did not find that factors such as the topic of the course, number of raters or number of criteria to be evaluated had a significant effect on reliability. We compare our results with other studies, discuss about the potential implications of this low reliability for summative assessment, and provide some recommendations to maximize the benefit of implementing peer activities in online courses.

[1]  M. Boekaerts,et al.  Self-Regulation in the Classroom: A Perspective on Assessment and Intervention , 2005 .

[2]  Saskia Brand-Gruwel,et al.  Peer Assessment Training in Teacher Education: Effects on performance and perceptions , 2002 .

[3]  William R. Watson,et al.  Learner profiles of attitudinal learning in a MOOC: An explanatory sequential mixed methods study , 2017, Comput. Educ..

[4]  Hyun Bae Yoon,et al.  Validity and reliability assessment of a peer evaluation method in team-based learning classes , 2018, Korean journal of medical education.

[5]  Thorsten Joachims,et al.  Methods for ordinal peer grading , 2014, KDD.

[6]  D. Nicol,et al.  Formative assessment and self‐regulated learning: a model and seven principles of good feedback practice , 2006 .

[7]  P. Orsmond,et al.  The Use of Student Derived Marking Criteria in Peer and Self-assessment , 2000 .

[8]  Claire E. Weinstein,et al.  Self-regulation and learning strategies , 2011 .

[9]  Keith J. Topping,et al.  Peer Assessment: Learning by Judging and Discussing the Work of Other Learners , 2017 .

[10]  Frans J. Prins,et al.  Formative peer assessment in primary school: the effects of involving pupils in setting assessment criteria on their appraisal and feedback style , 2018 .

[11]  Anthony C. Robinson,et al.  Peer Grading in a MOOC: Reliability, Validity, and Perceived Effects. , 2014 .

[12]  Yao-Ting Sung,et al.  How many heads are better than one? The reliability and validity of teenagers' self- and peer assessments. , 2010, Journal of adolescence.

[13]  Justin Cheng,et al.  Peer and self assessment in massive online classes , 2013, ACM Trans. Comput. Hum. Interact..

[14]  F. Dochy,et al.  Assessment as a tool for learning , 1997 .

[15]  Ibrahim S. Al-fallay The Role of Some Selected Psychological and Personality Traits of the Rater in the Accuracy of Self- and Peer-Assessment. , 2004 .

[16]  L. Stefani Peer, self and tutor assessment: Relative reliabilities , 1994 .

[17]  Eugenia M. W. Ng,et al.  Using a mixed research method to evaluate the effectiveness of formative assessment in supporting student teachers' wiki authoring , 2014, Comput. Educ..

[18]  Tess Miller Formative computer‐based assessment in higher education: the effectiveness of feedback in supporting student learning , 2009 .

[19]  Klaus Krippendorff,et al.  Estimating the Reliability, Systematic Error and Random Error of Interval Data , 1970 .

[20]  Rens van de Schoot,et al.  Lifelong learning as a goal – Do autonomy and self-regulation in school result in well prepared pupils? , 2012 .

[21]  Natascha van Hattum-Janssen,et al.  Peer and Self-Assessment for First-Year Students as a Tool to Improve Learning , 2008 .

[22]  Guan-Yu Lin,et al.  Anonymous versus identified peer assessment via a Facebook-based learning application: Effects on quality of peer feedback, perceived learning, perceived fairness, and attitude toward the system , 2018, Comput. Educ..

[23]  Manuel Castro,et al.  A practice-based MOOC for learning electronics , 2014, 2014 IEEE Global Engineering Education Conference (EDUCON).

[24]  Martin Lehmann,et al.  Problem-oriented and project-based learning (POPBL) as an innovative learning strategy for sustainable development in engineering education , 2008 .

[25]  Anders Jonsson,et al.  The use of scoring rubrics: Reliability, validity, and educational consequences , 2007 .

[26]  D. Boud Sustainable Assessment: Rethinking assessment for the learning society , 2000 .

[27]  Michael B. Paulsen,et al.  Teaching and Learning in the College Classroom , 1998 .

[28]  G. Brosvic,et al.  Immediate Feedback Assessment Technique Promotes Learning and Corrects Inaccurate first Responses , 2002 .

[29]  Wilbert J. McKeachie Teaching and Learning in the College Classroom. A Review of the Research Literature (1986) and November 1987 Supplement. , 1987 .

[30]  Gwo-Jen Hwang,et al.  Effects of different online peer-feedback approaches on students' performance skills, motivation and self-efficacy in a dance course , 2016, Comput. Educ..

[31]  Caroline V. Gipps,et al.  What is the role for ICT-based assessment in universities? , 2005 .

[32]  Martin Formanek,et al.  Insights about large-scale online peer assessment from an analysis of an astronomy MOOC , 2017, Comput. Educ..

[33]  Cheryl Campanella Bracken,et al.  Practical Resources for Assessing and Reporting Intercoder Reliability in Content Analysis Research Projects , 2005 .

[34]  Zacharias C. Zacharia,et al.  Peer versus expert feedback: An investigation of the quality of peer feedback among secondary school students , 2014, Comput. Educ..

[35]  Robert A. Nash,et al.  Supporting Learners' Agentic Engagement With Feedback: A Systematic Review and a Taxonomy of Recipience Processes , 2017 .

[36]  Ramón Capdevilla Pagès,et al.  Los Cursos Online Masivos y Abiertos: ¿oportunidad o amenaza para las universidades iberoamericanas? , 2014 .

[37]  Dhananjay Ambekar,et al.  Evaluation of essays using incremental training for Maximizing Human-Machine agreement , 2014 .

[38]  D. Sluijsmans,et al.  The use of self-, peer and co-assessment in higher education: A review , 1999 .

[39]  Mohammad Salehi,et al.  An investigation of the reliability and validity of peer, self-, and teacher assessment , 2017 .

[40]  Philipp Schaer,et al.  Better than Their Reputation? On the Reliability of Relevance Assessments with Students , 2012, CLEF.

[41]  R. Ploetzner,et al.  Collaborative Inquiry Learning: Models, tools, and challenges , 2010 .

[42]  Christoph Meinel,et al.  Improving the Peer Assessment Experience on MOOC Platforms , 2016, L@S.

[43]  Joanna Bull,et al.  Assessing student learning in higher education , 1997 .

[44]  Eugenia M. W. Ng,et al.  Fostering pre-service teachers' self-regulated learning through self- and peer assessment of wiki projects , 2016, Comput. Educ..

[45]  Hongli Li,et al.  Peer assessment in the digital age: a meta-analysis comparing peer and teacher ratings , 2016 .

[46]  Jeremy Kepner,et al.  Learning by doing, High Performance Computing education in the MOOC era , 2017, J. Parallel Distributed Comput..

[47]  Kari Smith,et al.  Formative Assessment and Feedback: Making Learning Visible. , 2012 .

[48]  Chih-Ming Chen,et al.  Intelligent web-based learning system with personalized learning path guidance , 2008, Comput. Educ..

[49]  Patricia Cartney,et al.  Exploring the use of peer assessment as a vehicle for closing the gap between feedback given and feedback used , 2010 .

[50]  Neil T. Heffernan,et al.  Does Immediate Feedback While Doing Homework Improve Learning? , 2013, FLAIRS.

[51]  Chen-Lin C. Kulik,et al.  Timing of Feedback and Verbal Learning , 1988 .

[52]  Mike Thelwall,et al.  Computer-based assessment: a versatile educational tool , 2000, Comput. Educ..

[53]  K. Krippendorff,et al.  The Content Analysis Reader , 2008 .

[54]  P. Robert-Jan Simons,et al.  The nature, reception, and use of online peer feedback in higher education , 2008, Comput. Educ..

[55]  René F. Kizilcec,et al.  Closing global achievement gaps in MOOCs , 2017, Science.

[56]  Jean-Yves Antoine,et al.  Weighted Krippendorff’s alpha is a more reliable metrics for multi-coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation , 2014, EACL.

[57]  C D Stephens,et al.  The development of the PAR Index (Peer Assessment Rating): reliability and validity. , 1992, European journal of orthodontics.

[58]  Allison Littlejohn,et al.  Context counts: How learners' contexts influence learning in a MOOC , 2015, Comput. Educ..

[59]  P. Orsmond,et al.  The Importance of Marking Criteria in the Use of Peer Assessment , 1996 .

[60]  Joseph Krajcik,et al.  A Scaffolding Design Framework for Software to Support Science Inquiry , 2004, The Journal of the Learning Sciences.

[61]  A. Karch,et al.  Measuring inter-rater reliability for nominal data – which coefficients and confidence intervals are appropriate? , 2016, BMC Medical Research Methodology.

[62]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[63]  Klaus Krippendorff,et al.  Agreement and Information in the Reliability of Coding , 2011 .

[64]  Ian Jones,et al.  Peer assessment using comparative and absolute judgement , 2015 .

[65]  Michael Mogessie Ashenafi Peer-assessment in higher education – twenty-first century practices, challenges and the way forward , 2017 .

[66]  M. R. Novick The axioms and principal results of classical test theory , 1965 .

[67]  Ian R. Cornford Learning-to-learn strategies as a basis for effective lifelong learning , 2002 .

[68]  Richard E. Susskind,et al.  The Future of the Professions: How Technology Will Transform the Work of Human Experts , 2016 .

[69]  Sarah Earle,et al.  Formative and summative assessment of science in English primary schools: evidence from the Primary Science Quality Mark , 2014 .

[70]  Didem Kılıç,et al.  An Examination of Using Self-, Peer-, and Teacher-Assessment in Higher Education: A Case Study in Teacher Education. , 2016 .

[71]  Sarah E. M. Meek,et al.  Is peer review an appropriate form of assessment in a MOOC? Student participation and performance in formative peer review , 2017 .

[72]  Ian Jones,et al.  Peer assessment without assessment criteria , 2014 .

[73]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[74]  Rael T. Lange Inter-rater Reliability , 2016 .

[75]  P. Sadler,et al.  The Impact of Self- and Peer-Grading on Student Learning , 2006 .

[76]  J.V. Benlloch-Dualde,et al.  Adapting teaching and assessment strategies to enhance competence-based learning in the framework of the european convergence process , 2007, 2007 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports.

[77]  William A. Stock,et al.  The Effects of Feedback Timing on Learning Facts: The Role of Response Confidence , 1994 .

[78]  Christian Wahl,et al.  Digital Education With IT: How to Create Motivational and Inclusive Education in Blended Learning Environments Using Flipped Learning: A Study in Nurse Education , 2014 .

[79]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[80]  N. Falchikov,et al.  Student Peer Assessment in Higher Education: A Meta-Analysis Comparing Peer and Teacher Marks , 2000 .

[81]  Saskia Brand-Gruwel,et al.  The Training of Peer Assessment Skills To Promote the Development of Reflection Skills in Teacher Education. , 2002 .

[82]  F. Marton,et al.  The University of Learning. Beyond Quality and Competence in Higher Education. , 2013 .

[83]  K. Topping Peer Assessment Between Students in Colleges and Universities , 1998 .

[84]  D. W. Zimmerman,et al.  Louis Guttman's Contributions to Classical Test Theory , 2005 .

[85]  Dina Tsagari,et al.  An Exploration of the Reliability and Validity of Peer Assessment of Writing in Secondary Education , 2014 .

[86]  Yun Xiao,et al.  The impact of two types of peer assessment on students' performance and satisfaction within a Wiki environment , 2008, Internet High. Educ..

[87]  Chaoyun Liang,et al.  Is learner self-assessment reliable and valid in a Web-based portfolio environment for high school students? , 2013, Comput. Educ..

[88]  D. Carless,et al.  Peer feedback: the learning element of peer assessment , 2006 .

[89]  Yoany Beldarrain,et al.  Distance Education Trends: Integrating new technologies to foster student interaction and collaboration , 2006 .

[90]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[91]  David Duran,et al.  Learning-by-teaching. Evidence and implications as a pedagogical mechanism , 2017 .

[92]  Lucas Jackson,et al.  Validity and Rater Reliability of Peer and Self Assessments for Urban Middle School Students , 2014 .

[93]  Cecilia Katzeff,et al.  Exploring Sustainable Practices in Workplace Settings through Visualizing Electricity Consumption , 2013, TCHI.

[94]  Gulsen Bagci Kilic,et al.  Peer Assessment of Elementary Science Teaching Skills , 2007 .

[95]  Arianne M. Dantas,et al.  Promoting student-centered active learning in lectures with a personal response system. , 2009, Advances in physiology education.

[96]  Gwowen Shieh,et al.  Exact Power and Sample Size Calculations for the Two One-Sided Tests of Equivalence , 2016, PloS one.

[97]  Leah Marks,et al.  Student Experience of Peer Assessment on an MSc Programme , 2013 .

[98]  Zhenghao Chen,et al.  Tuned Models of Peer Assessment in MOOCs , 2013, EDM.

[99]  Ingvar Gustavsson,et al.  PILAR: a Federation of VISIR Remote Laboratory Systems for Educational Open Activities , 2018, 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE).

[100]  P. Kirschner,et al.  Social and Cognitive Factors Driving Teamwork in Collaborative Learning Environments , 2006 .

[101]  S. Hanrahan,et al.  Assessing Self- and Peer-assessment: The students' views , 2001 .

[102]  Diana Lawrence-Brown,et al.  Differentiated Instruction: Inclusive Strategies for Standards-Based Learning That Benefit the Whole Class , 2004 .

[103]  Chi-Cheng Chang,et al.  A comparative analysis of the consistency and difference among teacher-assessment, student self-assessment and peer-assessment in a Web-based portfolio assessment environment for high school students , 2012, Comput. Educ..

[104]  Daniel Reinholz,et al.  The assessment cycle: a model for learning through peer assessment , 2016 .

[105]  Shyan-Ming Yuan,et al.  Developing science activities through a networked peer assessment system , 2002, Comput. Educ..

[106]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[107]  Martin Valcke,et al.  How effective are self- and peer assessment of oral presentation skills compared with teachers’ assessments? , 2012 .

[108]  L. Thurstone A law of comparative judgment. , 1994 .

[109]  René F. Kizilcec,et al.  Towards Equal Opportunities in MOOCs: Affirmation Reduces Gender & Social-Class Achievement Gaps in China , 2016, L@S.

[110]  Nancy Falchikov,et al.  Improving Assessment through Student Involvement: Practical Solutions for Aiding Learning in Higher and Further Education , 2005 .

[111]  Hoi K. Suen,et al.  Peer assessment for massive open online courses (MOOCs) , 2014 .

[112]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[113]  D. Sluijsmans,et al.  Effective peer assessment processes: Research findings and future directions , 2010 .

[114]  B. Friedman,et al.  An Expectancy Theory Motivation Approach to Peer Assessment , 2008 .

[115]  Christian D. Schunn,et al.  Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives , 2006 .

[116]  Carlos Delgado Kloos,et al.  Delving into Participants’ Profiles and Use of Social Tools in MOOCs , 2014, IEEE Transactions on Learning Technologies.

[117]  Borja Bordel,et al.  Enhanced Peer Assessment in MOOC Evaluation Through Assignment and Review Analysis , 2018, iJET.