A Coherence Model Based on Syntactic Patterns

We introduce a model of coherence which captures the intentional discourse structure in text. Our work is based on the hypothesis that syntax provides a proxy for the communicative goal of a sentence and therefore the sequence of sentences in a coherent discourse should exhibit detectable structural patterns. Results show that our method has high discriminating power for separating out coherent and incoherent news articles reaching accuracies of up to 90%. We also show that our syntactic patterns are correlated with manual annotations of intentional structure for academic conference articles and can successfully predict the coherence of abstract, introduction and related work sections of these articles.

[1]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[2]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[5]  Jean Carletta,et al.  An annotation scheme for discourse-level argumentation in research articles , 1999, EACL.

[6]  Marc Moens,et al.  What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text , 2000, EMNLP.

[7]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[8]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[9]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[10]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[11]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[14]  Johanna D. Moore,et al.  Priming of Syntactic Rules in Task-Oriented Dialogue and Spontaneous Conversation , 2006 .

[15]  Daniel Marcu,et al.  Discourse Generation Using Utility-Trained Coherence Models , 2006, ACL.

[16]  Pascale Fung,et al.  One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization , 2006, TSLP.

[17]  Micha Elsner,et al.  A Unified Local and Global Model for Discourse Coherence , 2007, NAACL.

[18]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[19]  Maria Liakata,et al.  Guidelines for the annotation of General Scientific Concepts (GSC) , 2008 .

[20]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[21]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[22]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[23]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[24]  Micha Elsner,et al.  Coreference-inspired Coherence Modeling , 2008, ACL.

[25]  Chris Mellish,et al.  Evaluating Centering for Information Ordering Using Corpora , 2009, CL.

[26]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[27]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[28]  Jackie Chi Kit Cheung,et al.  Utilizing Extra-Sentential Context for Parsing , 2010, EMNLP.

[29]  Hwee Tou Ng,et al.  Automatically Evaluating Text Coherence Using Discourse Relations , 2011, ACL.

[30]  Micha Elsner,et al.  Extending the Entity Grid with Entity-Specific Features , 2011, ACL.

[31]  Thierry Poibeau,et al.  A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents , 2011, EMNLP.

[32]  François Bavaud,et al.  Segmentation and Clustering of Textual Sequences: a Typological Approach , 2011, RANLP.