Automatic analysis of syntactic complexity in second language writing

We describe a computational system for automatic analysis of syntactic complexity in second language writing using fourteen different measures that have been explored or proposed in studies of second language development. The system takes a written language sample as input and produces fourteen indices of syntactic complexity of the sample based on these measures. The system is designed with advanced second language proficiency research in mind, and is therefore developed and evaluated using college-level second language writing data from the Written English Corpus of Chinese Learners (Wen et al. 2005). Experimental results show that the system achieves very high reliability on unseen test data from the corpus. We illustrate how the system is used in an example application to investigate whether and to what extent each of these measures significantly differentiate between different proficiency levels

[1]  Eli Hinkel,et al.  Simplicity Without Elegance: Features of Sentences in L1 and L2 Academic Texts , 2003 .

[2]  Sheldon Rosenberg,et al.  Indicators of linguistic competence in the peer group conversational behavior of mildly retarded adults , 1987, Applied Psycholinguistics.

[3]  M. Covington,et al.  HOW COMPLEX IS THAT SENTENCE? A PROPOSED REVISION OF THE ROSENBERG AND ABBEDUTO D-LEVEL SCALE , 2006 .

[4]  Barry K. Rosen,et al.  Syntactic Complexity , 1974, Inf. Control..

[5]  Sandra Ishikawa,et al.  Objective measurement of low-proficiency EFL narrative writing , 1995 .

[6]  K. Henry,et al.  Early L2 Writing Development: A Study of Autobiographical Essays by University-Level Students of Russian. , 1996 .

[7]  Rolf Kreyer,et al.  Inversion in Modern Written English: Syntactic Complexity, Information Status and the Creative Writer , 2006 .

[8]  Kellogg W. Hunt,et al.  Do Sentences in the Second Language Grow Like Those in the First , 1970 .

[9]  Xiaofei Lu,et al.  Automatic measurement of syntactic complexity in child language acquisition , 2009 .

[10]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[11]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[12]  Xu Zhi-jia,et al.  A Review of Spoken and Written English Corpus of Chinese Learners , 2008 .

[13]  Peter Skehan,et al.  The Influence of Planning and Task Type on Second Language Performance , 1996, Studies in Second Language Acquisition.

[14]  Christine Pearson Casanave,et al.  Language development in students' journals , 1994 .

[15]  L. Ortega Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-Level L2 Writing. , 2003 .

[16]  David R. Dowty,et al.  Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives , 1985 .

[17]  Dana R. Ferris,et al.  Lexical and Syntactic Features of ESL Writing by Students at Different Levels of L2 Proficiency , 1994 .

[18]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[19]  Diane Larsen-Freeman,et al.  An ESL Index of Development , 1978 .

[20]  Henri Béjoint,et al.  Vocabulary and Applied Linguistics , 1992 .

[21]  H. Scarborough Index of Productive Syntax , 1990, Applied Psycholinguistics.

[22]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[23]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[24]  Kathleen Bardovi-Harlig,et al.  Attainment of Syntactic and Morphological Accuracy by Advanced Language Learners , 1989, Studies in Second Language Acquisition.

[25]  Carlos A. Yorio,et al.  On TESOL '79: The Learner in Focus. , 1979 .

[26]  K. W. Hunt Grammatical structures written at three grade levels , 1965 .

[27]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[28]  Victor H. Yngve,et al.  A model and an hypothesis for language structure , 1960 .

[29]  Lourdes Ortega Alvarez-Ossorio,et al.  Understanding syntactic complexity : the measurement of change in the syntax of instructed L2 Spanish learners , 2000 .

[30]  Charlene Polio,et al.  Measures of Linguistic Accuracy in Second Language Writing Research , 1997 .

[31]  Thorsten Brants,et al.  Inter-annotator Agreement for a German Newspaper Corpus , 2000, LREC.

[32]  T. C. Cooper,et al.  Measuring Written Syntactic Patterns of Second Language Learners of German , 1976 .

[33]  Pierre J. L. Arnaud Objective Lexical and Grammatical Characteristics of L2 Written Compositions and the Validity of Separate-Component Tests , 1992 .

[34]  T. Homburg Holistic Evaluation of ESL Compositions: Can It Be Validated Objectively? , 1984 .

[35]  Hintat Cheung,et al.  Competing complexity metrics and adults' production of complex sentences , 1992, Applied Psycholinguistics.

[36]  H. Adelsberger,et al.  Author’s Address: , 2005 .

[37]  Shunji Inagaki,et al.  Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity , 1998 .