Lexical Tightness and Text Complexity

We present a computational notion of Lexical Tightness that measures global cohesion of content words in a text. Lexical tightness represents the degree to which a text tends to use words that are highly inter-associated in the language. We demonstrate the utility of this measure for estimating text complexity as measured by US school grade level designations of texts. Lexical tightness strongly correlates with grade level in a collection of expertly rated reading materials. Lexical tightness captures aspects of prose complexity that are not covered by classic readability indexes, especially for literary texts. We also present initial findings on the utility of this measure for automated estimation of complexity for poetry.

[1]  K. Sheehan,et al.  When Do Standard Approaches for Measuring Vocabulary Difficulty , Syntactic Complexity and Referential Cohesion Yield Biased Estimates of Text Difficulty ? , 2008 .

[2]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[3]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[7]  Michael Flor,et al.  Helping Teachers and Test Developers Select Texts for Use in Instruction and Assessment , 2014 .

[8]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[9]  M. Kathleen Sheehan,et al.  Sourcefinder: a construct-driven approach for locating appropriately targeted reading comprehension source texts , 2007, SLaTE.

[10]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[11]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[12]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[13]  Arthur C. Graesser,et al.  Coh-Metrix: Capturing Linguistic Features of Cohesion , 2010 .

[14]  Arthur C. Graesser,et al.  Coh-Metrix , 2011 .

[15]  Irene C. Fountas,et al.  Guiding Readers and Writers (Grades 3-6): Teaching, Comprehension, Genre, and Content Literacy , 1994 .

[16]  Beata Beigman Klebanov,et al.  Word Association Profiles and their Use for Automated Scoring of Essays , 2013, ACL.

[17]  A Jackson Stenner,et al.  How accurate are lexile text measures? , 2006, Journal of applied measurement.

[18]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[19]  Maxine Eskénazi,et al.  Classroom success of an intelligent tutoring system for lexical practice and reading comprehension , 2006, INTERSPEECH.

[20]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[21]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[22]  William H. DuBay The Principles of Readability. , 2004 .

[23]  Mari Ostendorf,et al.  A machine learning approach to reading level assessment , 2009, Comput. Speech Lang..

[24]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[25]  R. Mitkov,et al.  What can readability measures really tell us about text complexity , 2012 .

[26]  R. Gunning The Technique of Clear Writing. , 1968 .

[27]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[28]  Elfrieda H. Hiebert,et al.  Beyond Single Readability Measures: Using Multiple Sources of Information in Establishing Text Complexity , 2011 .

[29]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[30]  Yoko Futagi,et al.  Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. Research Report. ETS RR-10-28. , 2010 .

[31]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[32]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[33]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[34]  Attapol Khamkhien,et al.  Lexical Priming: A New Theory of Words and Language , 2013 .

[35]  Michael Flor,et al.  A fast and flexible architecture for very large word n-gram datasets , 2012, Natural Language Engineering.

[36]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[37]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.