Modelling Discourse Relations for Arabic

We present the first algorithms to automatically identify explicit discourse connectives and the relations they signal for Arabic text. First we show that, for Arabic news, most adjacent sentences are connected via explicit connectives in contrast to English, making the treatment of explicit discourse connectives for Arabic highly important. We also show that explicit Arabic discourse connectives are far more ambiguous than English ones, making their treatment challenging. In the second part of the paper, we present supervised algorithms to address automatic discourse connective identification and discourse relation recognition. Our connective identifier based on gold standard syntactic features achieves almost human performance. In addition, an identifier based solely on simple lexical and automatically derived morphological and POS features performs with high reliability, essential for languages that do not have high-quality parsers yet. Our algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation.

[1]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[2]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[3]  Owen Rambow,et al.  Building and Refining Rhetorical-Semantic Relation Models , 2007, HLT-NAACL.

[4]  Michael Halliday,et al.  Cohesion in English , 1976 .

[5]  B. Webber,et al.  Experiments on Sense Annotations and Sense Disambiguation of Discourse Connectives , 2005 .

[6]  Jason Baldridge,et al.  Probabilistic Head-Driven Parsing for Discourse Structure , 2005, CoNLL.

[7]  Matthew Stone,et al.  Discourse Relations: A Structural and Presuppositional Account Using Lexicalised TAG , 1999, ACL.

[8]  Ani Nenkova,et al.  Creating Local Coherence: An Empirical Assessment , 2010, NAACL.

[9]  Swapna Somasundaran,et al.  Discourse Level Opinion Interpretation , 2008, COLING.

[10]  Nianwen Xue,et al.  Annotating Discourse Connectives in the Chinese Treebank , 2005, FCA@ACL.

[11]  Jason Baldridge,et al.  Discourse Connective Argument Identification with Connective Specific Rankers , 2008, 2008 IEEE International Conference on Semantic Computing.

[12]  Ann Bies,et al.  Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools , 2004 .

[13]  Jian Su,et al.  Kernel Based Discourse Relation Recognition with Temporal Ordering Information , 2010, ACL.

[14]  Duncan Forbes,et al.  Grammar of the arabic language , 2011 .

[15]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[16]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[17]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[18]  Carl Paul Caspari,et al.  A grammar of the Arabic language , 1859 .

[19]  Karin C. Ryding,et al.  A Reference Grammar of Modern Standard Arabic , 2005 .

[20]  T. Sanders,et al.  The classification of coherence relations and their linguistic markers: An exploration of two languages , 1998 .

[21]  Helmut Prendinger,et al.  A Novel Discourse Parser Based on Support Vector Machine Classification , 2009, ACL.

[22]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[23]  Alex Lascarides,et al.  Edinburgh Research Explorer Using automatically labelled examples to classify rhetorical relations: an assessment , 2022 .

[24]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[25]  James Pustejovsky,et al.  Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources , 2006, SIGDIAL Workshop.

[26]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[27]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[28]  Ani Nenkova,et al.  Revisiting Readability: A Unified Framework for Predicting Text Quality , 2008, EMNLP.

[29]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[30]  Jian Su,et al.  Predicting Discourse Connectives for Implicit Discourse Relation Recognition , 2010, COLING.

[31]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[32]  Katja Markert,et al.  The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic , 2010, LREC.

[33]  Rashmi Prasad,et al.  Towards an Annotated Corpus of Discourse Relations in Hindi , 2008, IJCNLP.

[34]  J. Hobbs On the coherence and structure of discourse , 1985 .

[35]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[36]  James Pustejovsky,et al.  Automatically Identifying the Arguments of Discourse Connectives , 2007, EMNLP.

[37]  Bonnie L. Webber,et al.  A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus , 2008, IJCNLP.

[38]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.