The Accuracy of Expert-System Diagnoses of Mathematical Problem Solutions

Expert systems have the potential to help computer-based testing programs give qualitative feedback about examinee performance on constructed-response items. This study evaluated the accuracy of such feedback for algebra word problems. The responses of Graduate Record Examinations examinees were diagnostically analyzed by an expert system and by four human judges. Results showed that human judges agreed highly among themselves about whether errors were present in a solution, to a lesser degree when errors were categorized generally, and to only a limited degree on the detailed characterization of those faults. The expert system agreed very closely with the judges in characterizing responses as right or wrong but somewhat less so on classifying errors using either specific or general schemes. The accuracy of automatic qualitative judgments may be increased by using more general diagnostic categories and by integrating information from other sources, including performance on diverse item types.

[1]  Henry Braun,et al.  The Relationship of Expert-System Scored Constrained Free-Response Items to Multiple-Choice and Open-Ended Items , 1990 .

[2]  R. Mislevy A Framework for Studying Differences between Multiple-Choice and Free-Response Test Items. , 1991 .

[3]  Marc M. Sebrechts,et al.  Expert-System Scores for Complex Constructed-Response Quantitative Items: A Study of Convergent Validity , 1991 .

[4]  Henry Braun,et al.  Scoring Constructed Responses Using Expert Systems , 1990 .

[5]  John Woodward,et al.  The Misconceptions of Youth: Errors and Their Mathematical Meaning , 1994 .

[6]  Isaac I. Bejar,et al.  A methodology for scoring open-ended architectural design problems. , 1991 .

[7]  Kikumi K. Tatsuoka,et al.  ITEM CONSTRUCTION AND PSYCHOMETRIC MODELS APPROPRIATE FOR CONSTRUCTED RESPONSES , 1991 .

[8]  Anthony E. Kelly,et al.  Studies of Diagnosis and Remediation with High School Algebra Students , 1989, Cogn. Sci..

[9]  Marc M. Sebrechts,et al.  Agreement between expert-system and human raters' scores on complex constructed-response quantitative items. , 1991 .

[10]  Marc M. Sebrechts,et al.  From Testing to Training: Evaluating Automated Diagnosis in Statistics and Algebra , 1992, Intelligent Tutoring Systems.

[11]  Randy Elliot Bennett,et al.  A REVIEW OF AUTOMATICALLY SCORABLE CONSTRUCTED-RESPONSE ITEM TYPES FOR LARGE-SCALE ASSESSMENT , 1992 .

[12]  Elizabeth S. Martindale,et al.  A preliminary report on the performance of CLASS.LD2 , 1987 .

[13]  Donald A. Waterman,et al.  A Guide to Expert Systems , 1986 .