论文信息 - APPLYING CONTENT SIMILARITY METRICS TO CORPUS DATA: DIFFERENCES BETWEEN NATIVE AND NON‐NATIVE SPEAKER RESPONSES TO A TOEFL® INTEGRATED WRITING PROMPT - 字舞流文

APPLYING CONTENT SIMILARITY METRICS TO CORPUS DATA: DIFFERENCES BETWEEN NATIVE AND NON‐NATIVE SPEAKER RESPONSES TO A TOEFL® INTEGRATED WRITING PROMPT

For many purposes, it is useful to collect a corpus of texts all produced to the same stimulus, whether to measure performance (as on a test) or to test hypotheses about population differences. This paper examines several methods for measuring similarities in phrasing and content and demonstrates that these methods can be used to identify population differences between native and non-native speakers of English in a writing task.

Paul Deane | Olga Gurevich | P. Deane | Olga Gurevich

[1] W. Chafe. The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production , 1980 .

[2] Cherry Campbell. Writing with Others' Words: Native and Non-Native University Students' Use of Information from a Background Reading Text in Academic Compositions. , 1987 .

[3] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[4] Geoffrey E. Hinton. Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[5] Allyssa McCabe,et al. Developing narrative structure. , 1991 .

[6] Hinrich Schütze,et al. Word Space , 1992, NIPS.

[7] E. B. Page,et al. The Computer Moves into Essay Grading: Updating the Ancient Test. , 1995 .

[8] D. Ferris,et al. Academic Listening/Speaking Tasks for ESL Students: Problems, Suggestions, and Implications* , 1996 .

[9] Curt Burgess,et al. Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[10] Martin Chodorow,et al. Computer Analysis of Essay Content for Automated Score Prediction , 1998 .

[11] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .

[12] Shunji Inagaki,et al. Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity , 1998 .

[13] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[14] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15] Martin Chodorow,et al. C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[16] Tony A. Plate,et al. Holographic Reduced Representation: Distributed Representation for Cognitive Structures , 2003 .

[17] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18] Jill Burstein,et al. AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[19] Alister Cumming,et al. A teacher-verification study of speaking and writing prototype tasks for a new TOEFL , 2004 .

[20] Eileen Kintsch,et al. Summary Street: Interactive Computer Support for Writing , 2004 .

[21] Yong-Won Lee,et al. DEPENDABILITY OF NEW ESL WRITING TEST SCORES: EVALUATING PROTOTYPE TASKS AND ALTERNATIVE RATING SCHEMES , 2005 .

[22] Alister Cumming,et al. Analysis of Discourse Features and Verification of Scoring Levels for Independent and Integrated Prototype Written Tasks for the New TOEFL®. TOEFL® Monograph Series. MS-30. ETS RM-05-13. , 2005 .

[23] P. Deane. Chapter 4: Cooccurrence and constructions , 2006 .

[24] Casey Keck,et al. The use of paraphrase in summary writing: A comparison of L1 and L2 writers , 2006 .

[25] J. Burstein. Sentence similarity measures for essay coherence , 2007 .

[26] Rebecca J. Passonneau,et al. Annotation of Children's Oral Narrations: Modeling Emergent Narrative Skills for Computational Applications , 2007, FLAIRS.

[27] Patrick F. Reidy. An Introduction to Latent Semantic Analysis , 2009 .