Towards the Data-driven System for Rhetorical Parsing of Russian Texts

Results of the first experimental evaluation of machine learning models trained on RuRSTreebank – first Russian corpus annotated within RST framework – are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 ± 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

[1]  Manfred Stede,et al.  Adding Semantic Relations to a Large-Coverage Connective Lexicon of German , 2016, LREC.

[2]  Svetlana Toldova,et al.  Rhetorical relations markers in Russian RST Treebank , 2017 .

[3]  Jianwu Dang,et al.  Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning , 2018, COLING.

[4]  Laurence Danlos,et al.  LEXCONN: A French Lexicon of Discourse Connectives , 2010 .

[5]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[6]  Dongyan Zhao,et al.  Modeling discourse cohesion for discourse parsing via memory network , 2018, ACL.

[7]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[8]  Yaojie Lu,et al.  Shallow Convolutional Neural Network for Implicit Discourse Relation Recognition , 2015, EMNLP.

[9]  Manfred Stede,et al.  Constructing a Lexicon of English Discourse Connectives , 2018, SIGDIAL Conference.

[10]  Ilya Segalovich,et al.  A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine , 2003, MLMTA.

[11]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[12]  Magdalena Rysova,et al.  The Centre and Periphery of Discourse Connectives , 2014, PACLIC.

[13]  Leila Kosseim,et al.  Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations , 2017, SIGDIAL Conference.

[14]  Barbara Di Eugenio,et al.  Automatic Discourse Segmentation using Neural Networks , 2007 .

[15]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[16]  Amália Mendes,et al.  A Lexicon of Discourse Markers for Portuguese - LDM-PT , 2018, LREC.

[17]  Graeme Hirst,et al.  Text-level Discourse Parsing with Rich Linguistic Features , 2012, ACL.

[18]  Guodong Zhou,et al.  Employing Text Matching Network to Recognise Nuclearity in Chinese Discourse , 2018, COLING.

[19]  Jirí Mírovský,et al.  Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus , 2017, PACLIC.

[20]  Graeme Hirst,et al.  A Linear-Time Bottom-Up Discourse Parser with Constraints and Post-Editing , 2014, ACL.

[21]  Hen-Hsen Huang,et al.  A Unified RvNN Framework for End-to-End Chinese Discourse Parsing , 2018, COLING.

[22]  Manfred Stede,et al.  Primary and secondary discourse connectives: definitions and lexicons , 2018, Dialogue Discourse.

[23]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[24]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[25]  Svetlana Toldova,et al.  Automatic Mining of Discourse Connectives for Russian , 2018, AINL 2018.