Modeling Local Coherence: An Entity-Based Approach

This article proposes a novel framework for representing and measuring local coherence. Central to this approach is the entity-grid representation of discourse, which captures patterns of entity distribution in a text. The algorithm introduced in the article automatically abstracts a text into a set of entity transition sequences and records distributional, syntactic, and referential information about discourse entities. We re-conceptualize coherence assessment as a learning task and show that our entity-based representation is well-suited for ranking-based generation and text classification tasks. Using the proposed representation, we achieve good performance on text ordering, summary coherence evaluation, and readability assessment.

[1]  R. Gunning The Technique of Clear Writing. , 1968 .

[2]  Lauri Karttunen,et al.  Discourse Referents , 1969, COLING.

[3]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[4]  W. Chafe Givenness, contrastiveness, definiteness, subjects, topics, and point of view , 1976 .

[5]  Michael Halliday,et al.  Cohesion in English , 1976 .

[6]  E. Prince A COMPARISON OF WH-CLEFTS AND IT-CLEFTS IN DISCOURSE , 1978 .

[7]  Candace L. Sidner,et al.  Towards a computational theory of definite anaphora comprehension in English discourse , 1979 .

[8]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[9]  J. P. Kincaid,et al.  The Computer Readability Editing System , 1981, IEEE Transactions on Professional Communication.

[10]  James V. Mitchell The ninth mental measurements yearbook , 1985 .

[11]  Megumi Kameyama,et al.  A Property-Sharing Constraint in Centering , 1986, ACL.

[12]  William C. Mann,et al.  Rhetorical Structure Theory: A Framework for the Analysis of Texts , 1987 .

[13]  Carl Pollard,et al.  A Centering Approach to Pronouns , 1987, ACL.

[14]  T. Givón Beyond foreground and background , 1987 .

[15]  A. Svoboda,et al.  Functional sentence perspective and intensional logic , 1987 .

[16]  Mira Ariel Referring and accessibility , 1988, Journal of Linguistics.

[17]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[18]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[19]  Clarisse Sieckenius de Souza,et al.  Getting the message across in RST-based text generation , 1990 .

[20]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[22]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[23]  R. Ratcliff,et al.  Inference during reading. , 1992, Psychological review.

[24]  Marilyn A. Walker,et al.  Japanese Discourse and the Process of Centering , 1994, Comput. Linguistics.

[25]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[26]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[27]  J. Chall,et al.  Readability revisited : the new Dale-Chall readability formula , 1995 .

[28]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[29]  Vasileios Hatzivassiloglou,et al.  Two-Level, Many-Paths Generation , 1995, ACL.

[30]  Hermann Ney,et al.  Statistical Language Modeling and Word Triggers , 1996 .

[31]  清川 英男,et al.  CHALL, J. S. and DALE, E. (1995) Readability Revisited : The New Dale-Chall Readability Formula., Brookline Books , 1996 .

[32]  P. Sgall Functional sentence perspective , 1996 .

[33]  Udo Hahn,et al.  Functional Centering , 1996, ACL.

[34]  A. Jackson Stenner,et al.  Measuring Reading Comprehension with the Lexile Framework. , 1996 .

[35]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[36]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[37]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[38]  Chris Mellish,et al.  Experiments Using Stochastic Search for Text Planning , 1998, INLG.

[39]  Peter W. Foltz,et al.  Textual coherence using latent semantic analysis , 1998 .

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[42]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[43]  M. Walker,et al.  Centering Theory in Discourse , 1998 .

[44]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[45]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[46]  Udo Hahn,et al.  Functional Centering - Grounding Referential Coherence in Information Structure , 1999, Comput. Linguistics.

[47]  The Theory and Practice of Discourse Parsing and Summarization , 2000 .

[48]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[49]  Karen Kukich,et al.  The Role of Centering Theory's Rough-Shift in the Teaching and Evaluation of Writing Skills , 2000, ACL.

[50]  Malcolm I. Bauer,et al.  SourceFinder: Course preparation via linguistically targeted web search , 2001, J. Educ. Technol. Soc..

[51]  Michael Comstock 경험 규칙에 의한 대명사의 Coreference Resolution , 2001 .

[52]  Dekang Lin LaTaT: Language and Text Analysis Tools , 2001, HLT.

[53]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[54]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[55]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[56]  Marilyn A. Walker,et al.  SPoT: A Trainable Sentence Planner , 2001, NAACL.

[57]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[58]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[59]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[60]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[61]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[62]  Zhang Zheng Rhetorical Structure Theory and the RSTTool , 2003 .

[63]  L. Hasler An investigation into the use of Centering transitions for summarisation , 2003 .

[64]  Regina Barzilay,et al.  Sentence Alignment for Monolingual Comparable Corpora , 2003, EMNLP.

[65]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[66]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[67]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[68]  Christopher D. Manning,et al.  The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection , 2004 .

[69]  Ernst Althaus,et al.  Computing Locally Coherent Discourses , 2004, ACL.

[70]  Daniel Marcu,et al.  Fast and optimal decoding for machine translation , 2004, Artif. Intell..

[71]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[72]  Nikiforos Karamanis,et al.  Entity coherence for descriptive text structuring , 2004 .

[73]  Chris Mellish,et al.  Evaluating Centering-Based Metrics of Coherence , 2004, ACL.

[74]  Richard Power,et al.  Optimizing Referential Coherence in Text Generation , 2004, CL.

[75]  Barbara Di Eugenio,et al.  Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.

[76]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[77]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[78]  M. Eisenstein Getting the message , 2005, Nature Methods.

[79]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[80]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[81]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[82]  M. Ferguson,et al.  Automatic Evaluation , 2009 .

[83]  Mirella Lapata,et al.  Discourse Constraints for Document Compression , 2010, CL.

[84]  Manabu Okumura,et al.  Local Coherence Model Based on Entity Grid Augmented with Text Cohesion , 2010 .

[85]  P. Sreenivasa Kumar,et al.  A System for Query Specific Coherent Text Multi-Document Summarization , 2010, Int. J. Artif. Intell. Tools.

[86]  Kumiko Tanaka-Ishii,et al.  Sorting Texts by Readability , 2010, CL.

[87]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.