“How”—beyond the “what”, towards the “why”: A rule-assessment approach to achievement testing

Abstract The advantages of a rule assessment approach to the interpretation of achievement test results have been demonstrated using an S-P chart with coded error types. The problems of similar total test scores resulting from completely different misapprehensions, as well as correct answers resulting from incorrect rules of operation, were addressed using a simulated data-set. Although the overall quality of the test used here as measured by conventional psychometric indices proved satisfactory, it was shown that the traditional interpretation, which refers to total test scores, can be misleading, especially when adaptive remediation is sought. It is well known in medical sciences that a disease has several symptoms yet several diseases can share the same symptoms (i.e. high fever). Consequently, no responsible physician would prescribe the same medicine for two patients suffering from different diseases just because they both share high fever as one of their symptoms. Similarly, when two students with different misapprehensions get the same total test score, should the teacher prescribe the same remediation for correcting their misapprehension? Although the method for diagnostic test construction was out of the scope of this paper, it should be noted that test design is a crucial matter which eventually determines the quality of the diagnosis. One has to, therefore, carefully choose the items for the diagnosis in order to maximize the information about the rules of operation underlying the students' responses. A task specification chart (Birenbaum & Shaw, 1985) may serve as a useful tool in the process of test construction. As was illustrated in the chart, when an item yields the same results as a result of various “bugs”, its contribution to rule assessment is in question. Although in reality test results are contaminated by noise resulting from careless errors or strategy changes during the test, the overall identification rate achieved by diagnostic tests ranges between 70%–80% (Tatsuoka, 1984). Similarly, current AI diagnostic systems such as DEBUGGY and DPF are reported as being capable of identifying 80%–90% of student errors (VanLehn, 1981; Ohlesson & Langley, 1985). It seems that such a rate justifies the tedious work involved in constructing a diagnostic tool.

[1]  P. Langley,et al.  Identifying Solution Paths in Cognitive Diagnosis. , 1985 .

[2]  R. Sternberg,et al.  2 – The Representation and Processing of Information in Real-Time Verbal Comprehension* , 1985 .

[3]  R. Siegler Three aspects of cognitive development , 1976, Cognitive Psychology.

[4]  Kikumi K. Tatsuoka,et al.  Spotting Erroneous Rules of Operation by the Individual Consistency Index. , 1983 .

[5]  Kurt VanLehn,et al.  Felicity conditions for human skill acquisition: validating an ai-based theory , 1983 .

[6]  Kikumi K. Tatsuoka,et al.  Rule Space, the Product Space of Two Score Components in Signed-Number Subtraction: An Approach to Dealing with Inconsistent Use of Erroneous Rules. , 1982 .

[7]  Menucha Birenbaum,et al.  TASK SPECIFICATION CHART: A KEY TO A BETTER UNDERSTANDING OF TEST RESULTS , 1985 .

[8]  Robert J. Sternberg,et al.  Component Processes in Analogical Reasoning. , 1977 .

[9]  D. Harnisch ITEM RESPONSE PATTERNS: APPLICATIONS FOR EDUCATIONAL PRACTICE , 1983 .

[10]  Isaac I. Bejar,et al.  EDUCATIONAL DIAGNOSTIC ASSESSMENT , 1984 .

[11]  Susan E. Whitely,et al.  Measuring Aptitude Processes with Multicomponent Latent Trait Models. , 1981 .

[12]  R. Glaser,et al.  The Future of Testing: A Research Agenda for Cognitive Psychology and Psychometrics. , 1981 .

[13]  R. Sternberg Intelligence, Information Processing and Analogical Reasoning : The Componential Analysis of Human Abilities , 1977 .

[14]  Derek H. Sleeman,et al.  Modelling Student's Problem Solving , 1981, Artif. Intell..

[15]  Earl C. Butterfield,et al.  4 – Theoretically Based Psychometric Measures of Inductive Reasoning* , 1985 .

[16]  Hayes identifying the organization of wi iiing processes , 1980 .

[17]  Susan E. Whitely,et al.  Multicomponent latent trait models for ability tests , 1980 .

[18]  David Bartholomae,et al.  The Study of Error. , 1980 .

[19]  Mina P. Shaughnessy Some Needed Research on Writing. , 1977 .

[20]  Kikumi K. Tatsuoka,et al.  A Probabilistic Model for Diagnosing Misconceptions By The Pattern Classification Approach , 1985 .

[21]  John Seely Brown,et al.  Diagnostic Models for Procedural Bugs in Basic Mathematical Skills , 1978, Cogn. Sci..