Leveling L2 Texts Through Readability: Combining Multilevel Linguistic Features with the CEFR

Selecting appropriate texts for L2 (second/foreign language) learners is an important approach to enhancing motivation and, by extension, learning. There is currently no tool for classifying foreign language texts according to a language proficiency framework, which makes it difficult for students and educators to determine the precise difficulty/complexity levels of an unclassified text. Taking the Chinese language as an example, this study aimed to create a readability assessment system, called the Chinese Readability Index Explorer for Chinese as a Foreign Language (CRIE-CFL), in order to level-that is, to sort by proficiency level-texts that will be used for instructional purposes. The framework of choice in this project is the Common European Framework of Reference (CEFR). A team of expert CFL teachers first classified 1,578 CFL texts into their appropriate CEFR levels. A set of 30 CFL readability features was then developed or drawn from previous research, and sorted according to importance using F-scores. In addition, a support vector machine model was trained by sequentially integrating the features into the model to optimize accuracy. The empirical evaluation of CRIE-CFL revealed average exact- and adjacent-level accuracies of 74.97% and 99.62%, respectively, for predicting the expert classification of a text. The functionalities of CRIE-CFL are introduced and discussed. [ABSTRACT FROM AUTHOR]

[1]  Noriko Nagai,et al.  Adaptation of the CEFR to remedial English language education in Japan , 2013 .

[2]  Mary E. O'donnell,et al.  Finding Middle Ground in Second Language Reading: Pedagogic Modifications That Increase Comprehensibility and Vocabulary Acquisition While Preserving Authentic Text Features , 2009 .

[3]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[4]  D. McNamara,et al.  A Linguistic Analysis of Simplified and Authentic Texts , 2007 .

[5]  T. Tardif Nouns are not always learned before verbs : Evidence from Mandarin speakers' early vocabularies , 1996 .

[6]  Heather H. Mitchell,et al.  Toward a Taxonomy of a Set of Discourse Markers in Dialog: A Theoretical and Computational Linguistic Account , 2003 .

[7]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[8]  Frank W. Medley,et al.  Language With A Purpose: Using Authentic Materials in the Foreign Language Classroom , 1988 .

[9]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[10]  Patricia L. Carrell,et al.  Readability in ESL. , 1987 .

[11]  Richard J. Tannenbaum,et al.  MAPPING ENGLISH LANGUAGE PROFICIENCY TEST SCORES ONTO THE COMMON EUROPEAN FRAMEWORK , 2005 .

[12]  Andrew C. Porter,et al.  Common Core Standards , 2011 .

[13]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[14]  Sheng Ding,et al.  Feature Selection Based F-Score and ACO Algorithm in Support Vector Machine , 2009, 2009 Second International Symposium on Knowledge Acquisition and Modeling.

[15]  D. Andina,et al.  Feature selection using Sequential Forward Selection and classification applying Artificial Metaplasticity Neural Network , 2010, IECON 2010 - 36th Annual Conference on IEEE Industrial Electronics Society.

[16]  Monika Chavez LEARNER'S PERSPECTIVES ON AUTHENTICITY , 1998 .

[17]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[18]  Marcel Adam Just,et al.  17 – What Your Eyes Do while Your Mind Is Reading1 , 1983 .

[19]  C. P. Whaley Word–nonword classification time. , 1978 .

[20]  S. Jay Samuels,et al.  Developmental changes in character-complexity and word-length effects when reading Chinese script , 2010 .

[21]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[22]  Mary J. Schleppegrell Subordination and linguistic complexity , 1992 .

[23]  D. McNamara,et al.  Assessing Text Readability Using Cognitively Based Indices , 2008 .

[24]  G. Westhoff,et al.  Challenges and Opportunities of the CEFR for Reimagining Foreign Language Pedagogy , 2007 .

[25]  D. Balota,et al.  Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. , 1984, Journal of experimental psychology. Human perception and performance.

[26]  Danielle S. McNamara,et al.  Text simplification and comprehensible input: A case for an intuitive approach , 2012 .

[27]  J. Charles Alderson,et al.  The CEFR and the Need for More Research , 2007 .

[28]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[29]  Minoo Alemi,et al.  Textbook Evaluation : EFL Teachers ’ Perspectives on “ Pacesetter Series ” , 2012 .

[30]  K. Forster,et al.  Lexical Access and Naming Time. , 1973 .

[31]  W. Lee Authenticity Revisited: Text Authenticity and Learner Authenticity. , 1995 .

[32]  James Milton,et al.  Vocabulary Size and the Common European Framework of Reference for Languages , 2009 .

[33]  Yujong Park,et al.  Using News Articles to Build a Critical Literacy Classroom in an EFL Setting , 2011 .

[34]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[35]  Twila Tardif,et al.  But Are They Really Verbs? Chinese Words for Action. , 2006 .

[36]  Esther Geva,et al.  The Role of Conjunctions in L2 Text Comprehension , 1992 .

[37]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[38]  A. Mehdi Riazi,et al.  Readability of texts : human evaluation versus computer index , 2012 .

[39]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[40]  Adel I. Tweissi,et al.  The Effects of the Amount and Type of Simplification on Foreign Language Reading Comprehension. , 1998 .

[41]  Farid Melgani,et al.  Toward an Optimal SVM Classification System for Hyperspectral Remote Sensing Images , 2006, IEEE Transactions on Geoscience and Remote Sensing.

[42]  C. Chien,et al.  Differentiated Instruction in an Elementary School EFL Classroom. , 2012 .

[43]  Danielle S. McNamara,et al.  Text readability and intuitive simplification: A comparison of readability formulas , 2011 .

[44]  Twila Tardif,et al.  The Influence of Adult Input on Children's Early Word Learning:A Case Study of A Mandarin-Speaking Child , 2009 .

[45]  W. F. Hawkins,et al.  Vexierversuch: the log relationship between word-frequency and recognition obtained in the absence of stimulus words. , 1958, Journal of experimental psychology.

[46]  Kumiko Tanaka-Ishii,et al.  Sorting Texts by Readability , 2010, CL.

[47]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[48]  W. A. Sumner,et al.  A recalculation of four adult readability formulas. , 1958 .

[49]  Yao-Ting Sung,et al.  Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning , 2015, Behavior research methods.

[50]  Heidi Byrnes Of frameworks and the goals of collegiate foreign language education: critical reflections , 2012 .

[51]  Mostafa Zamanian,et al.  Readability of Texts: State of the Art , 2012 .

[52]  M. Covington,et al.  HOW COMPLEX IS THAT SENTENCE? A PROPOSED REVISION OF THE ROSENBERG AND ABBEDUTO D-LEVEL SCALE , 2006 .

[53]  Chien-Liang Liu,et al.  An Unsupervised Automated Essay Scoring System , 2010, IEEE Intelligent Systems.

[54]  Ann Grafstein,et al.  The linguistic assumptions underlying readability formulae , 2001 .

[55]  Yi-Chun Chen,et al.  Zero Anaphora Resolution in Chinese with Shallow Parsing , 2007, J. Chin. Lang. Comput..

[56]  Michael H. Long,et al.  THE EFFECTS OF SIMPLIFIED AND ELABORATED TEXTS ON FOREIGN LANGUAGE READING COMPREHENSION , 1994 .

[57]  Kevyn Collins-Thompson,et al.  An Analysis of Statistical Models and Features for Reading Difficulty Prediction , 2008, ACL 2008.

[58]  J. Kroll,et al.  Category Interference in Translation and Picture Naming: Evidence for Asymmetric Connections Between Bilingual Memory Representations , 1994 .

[59]  J. Grainger,et al.  MASKED PRIMING BY TRANSLATION EQUIVALENTS IN PROFICIENT BILINGUALS , 1998 .

[60]  P. Nation,et al.  Unknown vocabulary density and reading comprehension , 2020 .

[61]  Bertram C. Bruce,et al.  Why readability formulas fail , 1981, IEEE Transactions on Professional Communication.