论文信息 - Serving CS Formative Feedback on Assessments Using Simple and Practical Teacher-Bootstrapped Error Models

Serving CS Formative Feedback on Assessments Using Simple and Practical Teacher-Bootstrapped Error Models

Author(s): Stephens-Martinez, Kristin | Advisor(s): Fox, Armando | Abstract: The demand for computing education in post-secondary education is growing. However, teaching staff hiring is not keeping pace, leading to increasing class sizes. As computers are becoming ubiquitous, classes are following suit by increasing their use of technology. These two defining factors of scaled classes require us to reconsider teaching practices that originated in small classes with little technology. Rather than seeing scaled classes as a problem that needs management, we propose it is an opportunity that lets us collect and analyze large, high dimensional data sets and enables us to conduct experiments at scale.One way classes are increasing their use of technology is moving content delivery and assessment administration online. Massive Open Online Courses (MOOCs) have taken this to an extreme by delivering all material online, having no face-to-face interaction, and allowing the class to include thousands of students at once. To understand how this changes the information needs of the teacher, we surveyed MOOC teachers and compared our results to prior work that ran similar surveys among teachers of smaller online courses. While our results were similar, we did find that the MOOC teachers surveyed valued qualitative data – such as forum activity and student surveys – more than quantitative data such as grades. The potential reason for these results is that teachers found quantitative data insufficient to monitor class dynamics, such as problems with course material and student thought processes. They needed a source of data that required less upfront knowledge of what the teacher wanted to look for and how to find it. With such data, their understanding of the students and class situation could be more holistic.Since qualitative data such as forum activity and surveys have an inherent selection bias, we focused on required, constructed-response assessments in the course. This reduced selection bias had the advantages of needing less upfront knowledge and focused attention on measuring how well students are learning the material. Also, since MOOCs have a high proportion of auditors, we moved to studying a large local class to have a complete sample.We applied qualitative and quantitative methods to analyze wrong answers from constructed- response, code-tracing question sets delivered through an automated grading system. Using emergent coding, we defined tags to represent ways that a student might arrive at a wrong answer and applied them to our data set. Since what we identified as frequent wrong answers occurred at a much higher rate than infrequent wrong answers, we found that analyzing only these frequent wrong answers provides a representative overview of the data. In addition, a content expert is more likely to be able to tag a frequent wrong answer than a random wrong answer.Using the wrong answer to tag(s) association, we built a student error model and designed a hint intervention within the automated grading system. We deployed an in situ experiment in a large introductory computer science course to understand the effectiveness of parameters in the model and compared two different kinds of hints: reteaching and knowledge integration [28]. A reteaching hint re-explained the concept(s) associated with the tag. A knowledge integration hint focused on pushing the student in the right direction without re-explaining anything, such as reminding them of a concept or asking them to compare two aspects of the assessment. We found it was straightforward to implement and deploy our intervention experiment because of the existing class technology. In addition, for our model, we found co-occurrence provides useful information to propagate tags to wrong answers that we did not inspect. However, we were unable to find evidence that our hints improved student performance on post-test questions compared to no hints at all. Therefore, we performed a preliminary, exploratory analysis to understand potential reasons why our results are null and to inform future work.We believe scaled classes are a prime opportunity to study learning. This work is an example of how to take advantage of this chance by first collecting and analyzing data from a scaled class and then deploying a scaled in situ intervention by using the scaled class’s technology. With this work, we encourage other researchers to take advantage of scaled classes and hope it can serve as a starting point for how to do so.

Kristin Stephens-Martinez | Kristin Stephens-Martinez

[1] Neil T. Heffernan,et al. The ASSISTments Ecosystem: Building a Platform that Brings Scientists and Teachers Together for Minimally Invasive Research on Human Learning and Teaching , 2014, International Journal of Artificial Intelligence in Education.

[2] Johan Jeuring,et al. Towards a Systematic Review of Automated Feedback Generation for Programming Exercises , 2016, ITiCSE.

[3] Ryan Shaun Joazeiro de Baker,et al. Stupid Tutoring Systems, Intelligent Humans , 2016, International Journal of Artificial Intelligence in Education.

[4] Aaron J. Smith,et al. My Digital Hand: A Tool for Scaling Up One-to-One Peer Teaching in Support of Computer Science Learning , 2017, SIGCSE.

[5] Jane E Caldwell,et al. Clickers in the large classroom: current research and best-practice tips. , 2007, CBE life sciences education.

[6] Félix Hernández-del-Olmo,et al. Enhancing E-Learning Through Teacher Support: Two Experiences , 2009, IEEE Transactions on Education.

[7] Niels Pinkwart,et al. A Review of AI-Supported Tutoring Approaches for Learning Programming , 2013, Advanced Computational Methods for Knowledge Engineering.

[8] Owen Conlan,et al. Visualizing Narrative Structures and Learning Style Information in Personalized e-Learning Systems , 2007, Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007).

[9] A. Huberman,et al. Qualitative Data Analysis: A Methods Sourcebook , 1994 .

[10] Juha Sorva,et al. Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises , 2012, Koli Calling.

[11] Libby Gerard,et al. Designing Automated Guidance to Promote Productive Revision of Science Explanations , 2017, International Journal of Artificial Intelligence in Education.