Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL

Reading is known to be an essential task in language learning, but finding the appropriate text for every learner is far from easy. In this context, automatic procedures can support the teacher's work. Some tools exist for English, but at present there are none for French as a foreign language (FFL). In this paper, we present an original approach to assessing the readability of FFL texts using NLP techniques and extracts from FFL textbooks as our corpus. Two logistic regression models based on lexical and grammatical features are explored and give quite good predictions on new texts. The results shows a slight superiority for multinomial logistic regression over the proportional odds model.

[1]  Jean Mesnager Lisibilité des textes pour enfants : un nouvel outil? , 1989 .

[2]  Kevyn Collins-Thompson,et al.  An Analysis of Statistical Models and Features for Reading Difficulty Prediction , 2008, ACL 2008.

[3]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[4]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[5]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[6]  Kevyn Collins-Thompson,et al.  Predicting reading difficulty with statistical language models , 2005, J. Assoc. Inf. Sci. Technol..

[7]  Gilbert de Landsheere Pour une application des tests de lisibilite de Flesch a la langue francaise. , 1963 .

[8]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[9]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[10]  M. Brysbaert,et al.  The use of film subtitles to estimate word frequencies , 2007, Applied Psycholinguistics.

[11]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[12]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[13]  S. Gerhand,et al.  Word frequency effects in oral reading are not merely age-of-acquisition effects in disguise. , 1998 .

[14]  Alexandra L. Uitdenbogerd Readability of French as a foreign language and its uses , 2005 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[17]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[18]  Susan Kemper,et al.  Measuring the Inference Load of a Text. , 1983 .

[19]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[20]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[21]  Boris New,et al.  Une base de données lexicales du français contemporain sur internet: LEXIQUE , 2001 .

[22]  清川 英男,et al.  CHALL, J. S. and DALE, E. (1995) Readability Revisited : The New Dale-Chall Readability Formula., Brookline Books , 1996 .

[23]  Marc Brysbaert,et al.  The effects of age-of-acquisition and frequency-of-occurrence in visual word recognition: Further evidence from the Dutch language , 2000 .

[24]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[25]  François Richaudeau Une nouvelle formule de lisibilité , 1979 .

[26]  W. Kintsch,et al.  Reading comprehension and readability in educational practice and psychological theory , 1979 .

[27]  Maxine Eskénazi,et al.  Combining Lexical and Grammatical Features to Improve Readability Measures for First and Second Language Texts , 2007, NAACL.

[28]  Claudette Cornaire La lisibilité: Essai d'application de la formule courte d'Henry au français langue étrangère , 1988 .