A model to predict lexical complexity and to grade words (Un modèle pour prédire la complexité lexicale et graduer les mots) [in French]

Analysing lexical complexity is a task that has mainly attracted the attention of psycholinguists and language teachers. More recently, this issue has seen a growing interest in the field of Natural Language Processing (NLP) and, in particular, that of automatic text simplification. The aim of this task is to identify words and structures which may be difficult to understand by a target audience and provide automated tools to simplify these contents. This article focuses on the lexical issue by identifying a set of predictors of the lexical complexity whose efficiency are assessed with a correlational analysis. The best of those variables are integrated into a model able to predict the difficulty of words for learners of French. Mots-clés : complexité lexicale, analyse morphologique, mots gradués, ressources lexicales.

[1]  J. O'Regan,et al.  Optimal landing position in reading isolated words and continuous text , 1990, Perception & psychophysics.

[2]  Bernard Lété,et al.  MANULEX: A grade-level lexical database from French elementary school readers , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[3]  A. Ghio,et al.  La dysarthrie au cours de la maladie de Parkinson. Histoire naturelle de ses composantes: dysphonie, dysprosodie et dysarthrie , 2010 .

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Lucia Specia,et al.  SemEval-2012 Task 1: English Lexical Simplification , 2012, *SEMEVAL.

[6]  Gil Francopoulo,et al.  Standards going concrete : from LMF to Morphalou , 2004, COLING 2004.

[7]  Cédrick Fairon,et al.  An “AI readability” Formula for French as a Foreign Language , 2012, EMNLP.

[8]  Cédrick Fairon,et al.  FLELex: a graded lexical resource for French foreign learners , 2014, LREC.

[9]  Michel Fayol,et al.  Psychologie cognitive de la lecture , 1992 .

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Christian Biemann,et al.  Corpus Portal for Search in Monolingual Corpora , 2006, LREC.

[12]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[13]  E. Thorndike The Teacher's Word Book , 2007 .

[14]  Thomas François,et al.  Les apports du traitement automatique des langues à la lisibilité du français langue étrangère , 2011 .

[15]  Max Coltheart,et al.  Access to the internal lexicon , 1977 .

[16]  M. Gernsbacher Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. , 1984, Journal of experimental psychology. General.

[17]  Mathieu Lafourcade,et al.  Making people play for Lexical Acquisition with the JeuxDeMots prototype , 2007 .

[18]  Conseil de l'Europe Cadre européen commun de référence pour les langues: apprendre, enseigner, évaluer , 2005 .

[19]  Cédrick Fairon,et al.  Towards a French lexicon with difficulty measures: NLP helping to bridge the gap between traditional dictionaries and specialized lexicons. , 2013 .

[20]  Delphine Bernhard,et al.  Apprentissage non supervisé de familles morphologiques : comparaison de méthodes et aspects multilingues , 2010 .

[21]  G. Gougenheim Dictionnaire fondamental de la langue française , 1958 .

[22]  Boris New,et al.  Une base de données lexicales du français contemporain sur internet: LEXIQUE , 2001 .

[23]  Marc Brysbaert,et al.  The effects of age-of-acquisition and frequency-of-occurrence in visual word recognition: Further evidence from the Dutch language , 2000 .

[24]  Robert Schreuder,et al.  How Complex Simplex Words can be , 1997 .

[25]  R. Solomon,et al.  Visual duration threshold as a function of word-probability. , 1951, Journal of experimental psychology.

[26]  Stephen Monsell,et al.  The nature and locus of word frequency effects in reading. , 2012 .

[27]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[28]  Noémie Elhadad,et al.  Putting it Simply: a Context-Aware Approach to Lexical Simplification , 2011, ACL.

[29]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[30]  Raman Chandrasekar,et al.  Motivations and Methods for Text Simplification , 1996, COLING.

[31]  Delphine Bernhard,et al.  Unsupervised Morphological Segmentation Based on Segment Predictability and Word Segments Alignment , 2009 .