Serving CS Formative Feedback on Assessments Using Simple and Practical Teacher-Bootstrapped Error Models

Author(s): Stephens-Martinez, Kristin | Advisor(s): Fox, Armando | Abstract: The demand for computing education in post-secondary education is growing. However, teaching staff hiring is not keeping pace, leading to increasing class sizes. As computers are becoming ubiquitous, classes are following suit by increasing their use of technology. These two defining factors of scaled classes require us to reconsider teaching practices that originated in small classes with little technology. Rather than seeing scaled classes as a problem that needs management, we propose it is an opportunity that lets us collect and analyze large, high dimensional data sets and enables us to conduct experiments at scale.One way classes are increasing their use of technology is moving content delivery and assessment administration online. Massive Open Online Courses (MOOCs) have taken this to an extreme by delivering all material online, having no face-to-face interaction, and allowing the class to include thousands of students at once. To understand how this changes the information needs of the teacher, we surveyed MOOC teachers and compared our results to prior work that ran similar surveys among teachers of smaller online courses. While our results were similar, we did find that the MOOC teachers surveyed valued qualitative data – such as forum activity and student surveys – more than quantitative data such as grades. The potential reason for these results is that teachers found quantitative data insufficient to monitor class dynamics, such as problems with course material and student thought processes. They needed a source of data that required less upfront knowledge of what the teacher wanted to look for and how to find it. With such data, their understanding of the students and class situation could be more holistic.Since qualitative data such as forum activity and surveys have an inherent selection bias, we focused on required, constructed-response assessments in the course. This reduced selection bias had the advantages of needing less upfront knowledge and focused attention on measuring how well students are learning the material. Also, since MOOCs have a high proportion of auditors, we moved to studying a large local class to have a complete sample.We applied qualitative and quantitative methods to analyze wrong answers from constructed- response, code-tracing question sets delivered through an automated grading system. Using emergent coding, we defined tags to represent ways that a student might arrive at a wrong answer and applied them to our data set. Since what we identified as frequent wrong answers occurred at a much higher rate than infrequent wrong answers, we found that analyzing only these frequent wrong answers provides a representative overview of the data. In addition, a content expert is more likely to be able to tag a frequent wrong answer than a random wrong answer.Using the wrong answer to tag(s) association, we built a student error model and designed a hint intervention within the automated grading system. We deployed an in situ experiment in a large introductory computer science course to understand the effectiveness of parameters in the model and compared two different kinds of hints: reteaching and knowledge integration [28]. A reteaching hint re-explained the concept(s) associated with the tag. A knowledge integration hint focused on pushing the student in the right direction without re-explaining anything, such as reminding them of a concept or asking them to compare two aspects of the assessment. We found it was straightforward to implement and deploy our intervention experiment because of the existing class technology. In addition, for our model, we found co-occurrence provides useful information to propagate tags to wrong answers that we did not inspect. However, we were unable to find evidence that our hints improved student performance on post-test questions compared to no hints at all. Therefore, we performed a preliminary, exploratory analysis to understand potential reasons why our results are null and to inform future work.We believe scaled classes are a prime opportunity to study learning. This work is an example of how to take advantage of this chance by first collecting and analyzing data from a scaled class and then deploying a scaled in situ intervention by using the scaled class’s technology. With this work, we encourage other researchers to take advantage of scaled classes and hope it can serve as a starting point for how to do so.

[1]  Neil T. Heffernan,et al.  The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching , 2014, International Journal of Artificial Intelligence in Education.

[2]  Johan Jeuring,et al.  Towards a Systematic Review of Automated Feedback Generation for Programming Exercises , 2016, ITiCSE.

[3]  Ryan Shaun Joazeiro de Baker,et al.  Stupid Tutoring Systems, Intelligent Humans , 2016, International Journal of Artificial Intelligence in Education.

[4]  Aaron J. Smith,et al.  My Digital Hand: A Tool for Scaling Up One-to-One Peer Teaching in Support of Computer Science Learning , 2017, SIGCSE.

[5]  Jane E Caldwell,et al.  Clickers in the large classroom: current research and best-practice tips. , 2007, CBE life sciences education.

[6]  Félix Hernández-del-Olmo,et al.  Enhancing E-Learning Through Teacher Support: Two Experiences , 2009, IEEE Transactions on Education.

[7]  Niels Pinkwart,et al.  A Review of AI-Supported Tutoring Approaches for Learning Programming , 2013, Advanced Computational Methods for Knowledge Engineering.

[8]  Owen Conlan,et al.  Visualizing Narrative Structures and Learning Style Information in Personalized e-Learning Systems , 2007, Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007).

[9]  A. Huberman,et al.  Qualitative Data Analysis: A Methods Sourcebook , 1994 .

[10]  Juha Sorva,et al.  Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises , 2012, Koli Calling.

[11]  Libby Gerard,et al.  Designing Automated Guidance to Promote Productive Revision of Science Explanations , 2017, International Journal of Artificial Intelligence in Education.

[12]  Marcia C. Linn,et al.  Distinguishing complex ideas about climate change: knowledge integration vs. specific guidance , 2016 .

[13]  Paul Swoboda,et al.  World Wide Web - Course Tool: An Environment for Building WWW-Based Courses , 1996, Comput. Networks.

[14]  Pieter Abbeel,et al.  Gradescope: A Fast, Flexible, and Fair System for Scalable Assessment of Handwritten Work , 2017, L@S.

[15]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  David E. Pritchard,et al.  Studying Learning in the Worldwide Classroom Research into edX's First MOOC. , 2013 .

[17]  Dragan Gasevic,et al.  Generating actionable predictive models of academic performance , 2016, LAK.

[18]  Armando Fox,et al.  Taking Advantage of Scale by Analyzing Frequent Constructed-Response, Code Tracing Wrong Answers , 2017, ICER.

[19]  Kikumi K. Tatsuoka,et al.  A Probabilistic Model for Diagnosing Misconceptions By The Pattern Classification Approach , 1985 .

[20]  Raymond Lister,et al.  On the Number of Attempts Students Made on Some Online Programming Exercises During Semester and their Subsequent Performance on Final Exam Questions , 2016, ITiCSE.

[21]  Ville Isomöttönen,et al.  Revisiting rainfall to explore exam questions and performance on CS1 , 2015, Koli Calling.

[22]  Randy Pausch,et al.  Alice: a 3-D tool for introductory programming concepts , 2000 .

[23]  Armando Fox,et al.  Monitoring MOOCs: which information sources do instructors value? , 2014, L@S.

[24]  Larry Ambrose,et al.  The power of feedback. , 2002, Healthcare executive.

[25]  William J. Gibbs,et al.  A Visualization Tool for Managing and Studying Online Communications , 2006, J. Educ. Technol. Soc..

[26]  Donald E. Powers,et al.  Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions , 2010 .

[27]  Ann L. Brown,et al.  How people learn: Brain, mind, experience, and school. , 1999 .

[28]  Rebecca Ferguson,et al.  Learning analytics: drivers, developments and challenges , 2012 .

[29]  Kurt VanLehn,et al.  Repair Theory: A Generative Theory of Bugs in Procedural Skills , 1980, Cogn. Sci..

[30]  Mario Antonioletti,et al.  e-learner tracking: Tools for discovering learner behaviour , 2004 .

[31]  Juha Sorva,et al.  Notional machines and introductory programming education , 2013, TOCE.

[32]  Neil T. Heffernan,et al.  Extending Knowledge Tracing to Allow Partial Credit: Using Continuous versus Binary Nodes , 2013, AIED.

[33]  Amber Settle,et al.  Some Trouble with Transparency: An Analysis of Student Errors with Object-oriented Python , 2016, ICER.

[34]  Raymond J. Mooney,et al.  Refinement-based student modeling and automated bug library construction , 1996 .

[35]  Ying Xie,et al.  Solving College Calculus Problems: A Study of Two Types of Instructor’s Feedback , 2010 .

[36]  Yigal Attali,et al.  Immediate Feedback and Opportunity to Revise Answers , 2011 .

[37]  Björn Hartmann,et al.  Should your MOOC forum use a reputation system? , 2014, CSCW.

[38]  K. Tatsuoka RULE SPACE: AN APPROACH FOR DEALING WITH MISCONCEPTIONS BASED ON ITEM RESPONSE THEORY , 1983 .

[39]  Vania Dimitrova,et al.  CourseVis: Externalising Student Information to Facilitate Instructors in Distance Learning , 2003 .

[40]  Juha Sorva,et al.  Visual program simulation in introductory programming education , 2012 .

[41]  Chris Piech,et al.  Deconstructing disengagement: analyzing learner subpopulations in massive open online courses , 2013, LAK '13.

[42]  Petri Ihantola,et al.  Do we know how difficult the rainfall problem is? , 2015, Koli Calling.

[43]  Raymond Lister,et al.  Relationships between reading, tracing and writing skills in introductory programming , 2008, ICER '08.

[44]  Anna N. Rafferty,et al.  Automated guidance for student inquiry. , 2016 .

[45]  Neil Brown,et al.  Investigating novice programming mistakes: educator beliefs vs. student data , 2014, ICER '14.

[46]  E. Mory Feedback research revisited. , 2004 .

[47]  Anne Venables,et al.  A closer look at tracing, explaining and code writing skills in the novice programmer , 2009, ICER '09.

[48]  Leonidas J. Guibas,et al.  Syntactic and Functional Variability of a Million Code Submissions in a Machine Learning MOOC , 2013, AIED Workshops.

[49]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[50]  Brian Dorn,et al.  Aggregate Compilation Behavior: Findings and Implications from 27,698 Users , 2015, ICER.

[51]  Simon,et al.  Multiple-choice vs free-text code-explaining examination questions , 2014, Koli Calling.

[52]  Vania Dimitrova,et al.  CourseVis: A graphical student monitoring tool for supporting instructors in web-based distance courses , 2007, Int. J. Hum. Comput. Stud..

[53]  G. Brosvic,et al.  Immediate Feedback during Academic Testing , 2001, Psychological reports.

[54]  Elliot Soloway,et al.  PROUST: Knowledge-Based Program Understanding , 1984, IEEE Transactions on Software Engineering.

[55]  Vania Dimitrova,et al.  Informing The Design of a Course Data Visualisator: an Empirical Study , 2003 .

[56]  Francisco J. García-Peñalvo,et al.  Semantic Spiral Timelines Used as Support for e-Learning , 2009, J. Univers. Comput. Sci..

[57]  Colin J. Fidge,et al.  Further evidence of a relationship between explaining, tracing and writing skills in introductory programming , 2009, ITiCSE.

[58]  Mark Guzdial,et al.  The FCS1: a language independent assessment of CS1 knowledge , 2011, SIGCSE.

[59]  R. Dihoff,et al.  The Role of Feedback During Academic Testing: The Delay Retention Effect Revisited , 2003 .

[60]  Raymond Lister,et al.  Not seeing the forest for the trees: novice programmers and the SOLO taxonomy , 2006, ITICSE '06.

[61]  V. Shute Focus on Formative Feedback , 2008 .

[62]  Elliot Soloway,et al.  Simulating Student Programmers , 1989, IJCAI.

[63]  Michael C. Rodriguez Three Options Are Optimal for Multiple‐Choice Items: A Meta‐Analysis of 80 Years of Research , 2005 .

[64]  Mark Guzdial,et al.  Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment , 2016, ICER.

[65]  Elliot Soloway,et al.  Classifying bugs is a tricky business , 1983 .

[66]  Peter Brusilovsky,et al.  Individualized exercises for self-assessment of programming knowledge: An evaluation of QuizPACK , 2005, JERC.

[67]  Roy D. Pea,et al.  Promoting active learning & leveraging dashboards for curriculum assessment in an OpenEdX introductory CS course for middle school , 2014, L@S.

[68]  Amruth N. Kumar,et al.  Explanation of step-by-step execution as feedback for problems on program analysis , and its generation in model-based problem-solving tutors , 2006 .

[69]  Riccardo Mazza,et al.  Exploring Usage Analysis in Learning Systems: Gaining Insights From Visualisations , 2005 .

[70]  Urban Nuldén,et al.  Visualizing learning activities to support tutors , 1999, CHI Extended Abstracts.

[71]  Mike Sharkey,et al.  Course correction: using analytics to predict course success , 2012, LAK '12.

[72]  John DeNero,et al.  Problems Before Solutions: Automated Problem Clarification at Scale , 2015, L@S.

[73]  Douglas B. Clark,et al.  WISE design for knowledge integration , 2003 .

[74]  K. Cotton,et al.  Monitoring Student Learning in the Classroom , 2009 .

[75]  Andrew F. Heckler,et al.  Factors Affecting Learning of Vector Math from Computer-Based Practice: Feedback Complexity and Prior Knowledge. , 2016 .

[76]  Jiawei Han,et al.  Re-examination of interestingness measures in pattern mining: a unified framework , 2010, Data Mining and Knowledge Discovery.

[77]  Christopher D. Hundhausen,et al.  The Normalized Programming State Model: Predicting Student Performance in Computing Courses Based on Programming Behavior , 2015, ICER.