Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments

Previous neural coherence models have focused on identifying semantic relations between adjacent sentences. However, they do not have the means to exploit structural information. In this work, we propose a coherence model which takes discourse structural information into account without relying on human annotations. We approximate a linguistic theory of coherence, Centering theory, which we use to track the changes of focus between discourse segments. Our model first identifies the focus of each sentence, recognized with regards to the context, and constructs the structural relationship for discourse segments by tracking the changes of the focus. The model then incorporates this structural information into a structure-aware transformer. We evaluate our model on two tasks, automated essay scoring and assessing writing quality. Our results demonstrate that our model, built on top of a pretrained language model, achieves state-of-the-art performance on both tasks. We next statistically examine the identified trees of texts assigned to different quality scores. Finally, we investigate what our model learns in terms of theoretical claims.

[1]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[2]  Roy Schwartz,et al.  Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.

[3]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[4]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[5]  Barbara Di Eugenio,et al.  An effective Discourse Parser that uses Rich Linguistic Information , 2009, NAACL.

[6]  Graeme Hirst,et al.  The Impact of Deep Hierarchical Discourse Structures in the Evaluation of Text Coherence , 2014, COLING.

[7]  Preslav Nakov,et al.  Using Discourse Structure Improves Machine Translation Evaluation , 2014, ACL.

[8]  Qi Li,et al.  Discourse Parsing with Attention-based Hierarchical Neural Networks , 2016, EMNLP.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Nan Yu,et al.  Transition-based Neural RST Parsing with Implicit Syntax Features , 2018, COLING.

[13]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[14]  Yue Zhang,et al.  Automatic Features for Essay Scoring – An Empirical Study , 2016, EMNLP.

[15]  Michael Strube,et al.  A Neural Local Coherence Model for Text Quality Assessment , 2018, EMNLP.

[16]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[17]  William C. Mann,et al.  Rhetorical Structure Theory: A Framework for the Analysis of Texts , 1987 .

[18]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[19]  Jacob Eisenstein,et al.  Closing the Gap: Domain Adaptation from Explicit to Implicit Discourse Relations , 2015, EMNLP.

[20]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  Junyi Jessy Li,et al.  Evaluating Discourse in Structured Text Representations , 2019, ACL.

[23]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[24]  Houfeng Wang,et al.  A Two-Stage Parsing Method for Text-Level Discourse Analysis , 2017, ACL.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[27]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[28]  Xiangnan Kong,et al.  Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? , 2020, ACL.

[29]  Ani Nenkova,et al.  What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain , 2013, TACL.

[30]  Noah A. Smith,et al.  Neural Discourse Structure for Text Categorization , 2017, ACL.

[31]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[32]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[33]  Hung-Yi Lee,et al.  Tree Transformer: Integrating Tree Structures into Self-Attention , 2019, EMNLP/IJCNLP.

[34]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[35]  John A. Bateman,et al.  Rhetorical structure theory , 2006 .

[36]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[37]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[38]  Barbara Plank,et al.  Multi-view and multi-task training of RST discourse parsers , 2016, COLING.

[39]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[40]  Carl Pollard,et al.  A Centering Approach to Pronouns , 1987, ACL.

[41]  Yue Zhang,et al.  Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring , 2017, CoNLL.

[42]  Joel R. Tetreault,et al.  Using Entity-Based Features to Model Coherence in Student Essays , 2010, HLT-NAACL.