Length of Textual Response as a Construct-Irrelevant Response Strategy: The Case of Shell Language. Research Report. ETS RR-13-07.

The paper applies a natural language computational tool to study a potential construct-irrelevant response strategy, namely the use of shell language. Although the study is motivated by the impending increase in the volume of scoring of students responses from assessments to be developed in response to the Race to the Top initiative, the data for the study were obtained from the GRE® Analytical Writing measure. The functioning of the shell detection computational tool was first evaluated by applying it to a corpus of over 200,000 issue and argument essays and by means of a study to evaluate whether the shell language score agreed with the characterization of shell by two scoring experts. It was concluded that the computational tool worked well. The tool was then used to select essays for rescoring to determine whether the presence of shell language had had an effect on the operational scores they received. We found no evidence that such an effect was present. However, we did find a leniency effect in the operational scores. That is, the essays that were rescored as part of this project received a lower score compared to the operational score. The validity implications of these results are discussed.

[1]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[2]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[3]  Nitin Madnani,et al.  Identifying High-Level Organizational Elements in Argumentative Discourse , 2012, NAACL.

[4]  Randy Elliot,et al.  Automated Scoring of Constructed-Response Literacy and Mathematics Items , 2011 .

[5]  Derrick Higgins,et al.  EVALUATING THE CONSTRUCT‐COVERAGE OF THE E‐RATER® SCORING ENGINE , 2009 .

[6]  Issac I. Bejar Rater Cognition: Implications for Validity. , 2012 .

[7]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[8]  Joan A. Mullin,et al.  The Testing Trap , 2003 .

[9]  Derrick Higgins,et al.  Evaluating the Construct-Coverage of the e-rater[R] Scoring Engine. Research Report. ETS RR-09-01. , 2009 .

[10]  Donald E. Powers "Wordiness": A Selective Review of Its Influence, and Suggestions for Investigating Its Relevance in Tests Requiring Extended Written Responses , 2005 .

[11]  Michael Flor,et al.  On the vulnerability of automated scoring to construct-irrelevant response strategies (CIRS): An illustration , 2014 .

[12]  Donald E. Powers EFFECTS OF PREEXAMINATION DISCLOSURE OF ESSAY PROMPTS FOR THE GRE ANALYTICAL WRITING ASSESSMENT , 2005 .