A Survey of Discourse Representations for Chinese Discourse Annotation

A key element in computational discourse analysis is the design of a formal representation for the discourse structure of a text. With machine learning being the dominant method, it is important to identify a discourse representation that can be used to perform large-scale annotation. This survey provides a systematic analysis of existing discourse representation theories to evaluate whether they are suitable for annotation of Chinese text. Specifically, the two properties, expressiveness and practicality, are introduced to compare the representations of theories based on rhetorical relations and the representations of theories based on entity relations. The comparison systematically reveals linguistic and computational characteristics of the theories. After that, we conclude that none of the existing theories are quite suitable for scalable Chinese discourse annotation because they are not both expressive and practical. Therefore, a new discourse representation needs to be proposed, which should balance the expressiveness and practicality, and cover rhetorical relations and entity relations. Inspired by the conclusions, this survey discusses some preliminary proposals on how to represent the discourse structure that are worth pursuing.

[1]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[2]  Rafael Dueire Lins,et al.  A Four Dimension Graph Model for Automatic Text Summarization , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[3]  Edward Gibson,et al.  Representing discourse coherence: A corpus-based analysis , 2004, COLING.

[4]  David E. Losada,et al.  Sentiment-Based Ranking of Blog Posts Using Rhetorical Structure Theory , 2013, NLDB.

[5]  Ting Liu,et al.  Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution , 2016, ACL.

[6]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[7]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[8]  Dragomir R. Radev A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.

[9]  Daniel Bonevac Discourse Representation Theory , 2012 .

[10]  Liang Wang,et al.  Text-level Discourse Dependency Parsing , 2014, ACL.

[11]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[12]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[13]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[14]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[15]  Sameer Pradhan,et al.  Unrestricted Coreference: Identifying Entities and Events in OntoNotes , 2007, International Conference on Semantic Computing (ICSC 2007).

[16]  Nicholas Asher,et al.  Discourse parsing for multi-party chat dialogues , 2015, EMNLP.

[17]  Guodong Zhou,et al.  Building a Chinese discourse topic corpus with a micro-topic scheme based on theme-rheme theory , 2017 .

[18]  Shafiq R. Joty,et al.  Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis , 2013, ACL.

[19]  Xuanjing Huang,et al.  Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network , 2016, ACL.

[20]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[21]  Piji Li,et al.  Abstractive Multi-Document Summarization via Phrase Selection and Merging , 2015, ACL.

[22]  Hwee Tou Ng,et al.  The CoNLL-2015 Shared Task on Shallow Discourse Parsing , 2015, CoNLL.

[23]  Pascal Denis,et al.  Constrained Decoding for Text-Level Discourse Parsing , 2012, COLING.

[24]  Nicholas Asher,et al.  Testing SDRT’s Right Frontier , 2010, COLING.

[25]  Chen Chen,et al.  Chinese Zero Pronoun Resolution: Some Recent Advances , 2013, EMNLP.

[26]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[27]  Yaojie Lu,et al.  Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition , 2015, EMNLP.

[28]  Yuping Zhou,et al.  PDTB-style Discourse Annotation of Chinese Text , 2012, ACL.

[29]  Hsin-Hsi Chen,et al.  Analyses of the Association between Discourse Relation and Sentiment Polarity with a Chinese Human-Annotated Corpus , 2013, LAW@ACL.

[30]  Ivan Titov,et al.  A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations , 2013, ACL.

[31]  Preslav Nakov,et al.  Using Discourse Structure Improves Machine Translation Evaluation , 2014, ACL.

[32]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[33]  Hwee Tou Ng,et al.  A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[34]  Amit Mishra,et al.  An Approach for Computing Sentiment Polarity Analysis of Complex Why-type Questions on Product Review Sites , 2014, Res. Comput. Sci..

[35]  Yu Zhou,et al.  A Novel Translation Framework Based on Rhetorical Structure Theory , 2013, ACL.

[36]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[37]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[38]  Yu Zhou,et al.  Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT , 2014, ACL.

[39]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[40]  Andrei Popescu-Belis,et al.  Improving Pronoun Translation by Modeling Coreference Uncertainty , 2016, WMT.

[41]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[42]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[43]  Zhou Guodong,et al.  Corpus Construction for Chinese Discourse Topic via Micro-Topic Scheme , 2017 .

[44]  Lou Boves,et al.  Discourse-based answering of why-questions Employing RST structure for finding answers to why-questions , 2007 .

[45]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[46]  Fang Kong,et al.  Towards Better Chinese Zero Pronoun Resolution from Discourse Perspective , 2017, Natural Language Processing and Chinese Computing.

[47]  Daniel Marcu,et al.  Induction of Word and Phrase Alignments for Automatic Document Summarization , 2005, CL.

[48]  Jacob Eisenstein,et al.  Discourse Connectors for Latent Subjectivity in Sentiment Analysis , 2013, NAACL.

[49]  Nianwen Xue,et al.  A Systematic Study of Neural Discourse Models for Implicit Discourse Relation , 2017, EACL.

[50]  Angela Downing Rothwell Thematic Progression as a Functional Resource in Analysing Texts , 1996 .

[51]  Bonnie L. Webber,et al.  D-LTAG System: Discourse Parsing with a Lexicalized Tree-Adjoining Grammar , 2003, J. Log. Lang. Inf..

[52]  Yu Zhou,et al.  Sentiment Classification of Chinese Contrast Sentences , 2014, NLPCC.

[53]  Yllias Chali,et al.  Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging , 2017, IJCNLP.

[54]  Peter Jansen,et al.  Discourse Complements Lexical Semantics for Non-factoid Answer Reranking , 2014, ACL.

[55]  Hwee Tou Ng,et al.  Exploiting Zero Pronouns to Improve Chinese Coreference Resolution , 2013, EMNLP.

[56]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[57]  Xun Wang,et al.  Summarization Based on Task-Oriented Discourse Parsing , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[58]  Chauncey C. Chu The Prototypicality of Topic in Mandarin Chinese. , 1993 .

[59]  F. Daneš,et al.  Papers on functional sentence perspective , 1974 .

[60]  F. Daneš Functional Sentence Perspective and the Organization of the Text , 1974 .

[61]  Aravind K. Joshi,et al.  Tree-adjoining grammars and lexicalized grammars , 1992, Tree Automata and Languages.

[62]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..

[63]  Giuseppe Carenini,et al.  Abstractive Summarization of Product Reviews Using Discourse Structure , 2014, EMNLP.

[64]  Guodong Zhou,et al.  A Micro-topic Model for Coreference Resolution Based on Theme-Rheme Structure , 2016, NLPCC/ICCPOL.

[65]  Iskandar Keskes Discourse analysis of arabic documents and application to automatic summarization , 2015 .

[66]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[67]  Anette Frank,et al.  Automatically Identifying Implicit Arguments to Improve Argument Linking and Coherence Modeling , 2013, *SEMEVAL.

[68]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[69]  Lou Boves,et al.  Discourse-based answering of why-questions , 2006, Trait. Autom. des Langues.

[70]  M. Osborne,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 , 2012 .

[71]  Chauncey Cheng-hsi Chu,et al.  A Discourse Grammar of Mandarin Chinese , 1998 .

[72]  Yang Liu,et al.  Memory Augmented Attention Model for Chinese Implicit Discourse Relation Recognition , 2017, CCL.

[73]  Haoran Li,et al.  An End-to-End Chinese Discourse Parser with Adaptation to Explicit and Non-explicit Relation Recognition , 2016, CoNLL Shared Task.

[74]  Fang Kong,et al.  A CDT-Styled End-to-End Chinese Discourse Parser , 2016, NLPCC/ICCPOL.

[75]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[76]  Helmut Prendinger,et al.  A Novel Discourse Parser Based on Support Vector Machine Classification , 2009, ACL.

[77]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[78]  Christopher D. Manning,et al.  Entity-Centric Coreference Resolution with Model Stacking , 2015, ACL.

[79]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[80]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[81]  Jingyi Wang,et al.  On Generalized-Topic-Based Chinese Discourse Structure , 2010, CIPS-SIGHAN.

[82]  Junyi Jessy Li,et al.  Assessing the Discourse Factors that Influence the Quality of Machine Translation , 2014, ACL.

[83]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[84]  Bonnie L. Webber,et al.  Discourse structure and language technology , 2011, Natural Language Engineering.

[85]  Hwee Tou Ng,et al.  CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing , 2016, CoNLL.

[86]  W. Bruce Croft,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2013 .

[87]  N. Nicolov,et al.  Sentiment Analysis : Does Coreference Matter ? , 2008 .

[88]  Thomas Meyer,et al.  Disambiguating temporal-contrastive connectives for machine translation , 2011, ACL.

[89]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[90]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[91]  Stephan Oepen,et al.  OPT: Oslo-Potsdam-Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing , 2016, CoNLL Shared Task.

[92]  Charles N. Li,et al.  Subject and topic , 1979 .

[93]  Claire Cardie,et al.  Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization , 2014, ACL.

[94]  Nicholas Asher,et al.  Distilling Opinion in Discourse: A Preliminary Study , 2008, COLING.

[95]  Laurence Danlos,et al.  Discourse Dependency Structures as Constrained DAGs , 2004, SIGDIAL Workshop.

[96]  Fang Kong,et al.  Building Chinese Discourse Corpus with Connective-driven Dependency Tree Structure , 2014, EMNLP.

[97]  Sharon Goldwater,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012 .

[98]  Fei Wang,et al.  Exploiting Discourse Relations for Sentiment Analysis , 2012, COLING.

[99]  Masaaki Nagata,et al.  Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[100]  Regina Barzilay,et al.  Machine Comprehension with Discourse Relations , 2015, ACL.

[101]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[102]  Weinan Zhang,et al.  Chinese Zero Pronoun Resolution with Deep Memory Network , 2017, EMNLP.

[103]  Zhou Qian,et al.  Topic-Chain-Based Coherence Annotation Scheme for Chinese Text , 2014 .

[104]  Nicholas Asher,et al.  Reference to abstract objects in discourse , 1993, Studies in linguistics and philosophy.

[105]  Jacob Eisenstein,et al.  One Vector is Not Enough: Entity-Augmented Distributed Semantics for Discourse Relations , 2014, TACL.

[106]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[107]  Eric P. Xing,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2014, ACL 2014.

[108]  Yue Ming,et al.  Rhetorical Structure Annotation of Chinese News Commentaries , 2008 .

[109]  Uzay Kaymak,et al.  Polarity analysis of texts using discourse structure , 2011, CIKM '11.

[110]  Christopher D. Manning,et al.  Improving Coreference Resolution by Learning Entity-Level Distributed Representations , 2016, ACL.

[111]  Chen Chen,et al.  Chinese Zero Pronoun Resolution with Deep Neural Networks , 2016, ACL.

[112]  Pushpak Bhattacharyya,et al.  Sentiment Analysis in Twitter with Lightweight Discourse Analysis , 2012, COLING.

[113]  Vincent Ng,et al.  Machine Learning for Entity Coreference Resolution: A Retrospective Look at Two Decades of Research , 2017, AAAI.

[114]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[115]  Jiajun Zhang,et al.  Implicit Discourse Relation Recognition for English and Chinese with Multiview Modeling and Effective Representation Learning , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[116]  Wei Gao,et al.  Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities , 2011, EMNLP.