Linking Discourse Marker Inventories

The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data.

[1]  Francis Bond,et al.  A Survey of WordNets and their Licenses , 2011 .

[2]  Diego Reforgiato Recupero,et al.  Semantic Web Machine Reading with FRED , 2017, Semantic Web.

[3]  Yuping Zhou,et al.  PDTB-style Discourse Annotation of Chinese Text , 2012, ACL.

[4]  Christian Chiarcos,et al.  cqp4rdf: Towards a Suite for RDF-Based Corpus Linguistics , 2020, ESWC.

[5]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[6]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[7]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[8]  Manfred Stede,et al.  DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives , 2020, LREC.

[9]  Christian Chiarcos,et al.  OWL/DL formalization of the MULTEXT-East morphosyntactic specifications , 2011, Linguistic Annotation Workshop.

[10]  Manfred Stede,et al.  Constructing a Lexicon of Dutch Discourse Connectives , 2018 .

[11]  Andrea Bellandi,et al.  Developing LexO: a Collaborative Editor of Multilingual Lexica and Termino-Ontological Resources in the Humanities , 2017 .

[12]  Harald Lüngen,et al.  Discourse Relations and Document Structure , 2010 .

[13]  A. Knott,et al.  Using Linguistic Phenomena to Motivate a Set of Coherence Relations. , 1994 .

[14]  Amália Mendes,et al.  TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style , 2019, Language Resources and Evaluation.

[15]  Christian Chiarcos,et al.  Annotation Interoperability for the Post-ISOCat Era , 2020, LREC.

[16]  Manfred Stede,et al.  Exploiting a lexical resource for discourse connective disambiguation in German , 2020, COLING.

[17]  Valeria Vitale,et al.  Recogito-in-a-Box: From Annotation to Digital Edition , 2020 .

[18]  Christian Chiarcos,et al.  Querying and visualizing coreference annotation in multi-layer corpora , 2011 .

[19]  Harald Lüngen,et al.  Using OWL ontologies in discourse parsing , 2006 .

[20]  Amália Mendes,et al.  A Lexicon of Discourse Markers for Portuguese - LDM-PT , 2018, LREC.

[21]  L. A. Alemany Representing discourse for automatic text summarization via shallow nlp techinques , 2005 .

[22]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[23]  Oren Etzioni,et al.  Panlingual lexical translation via probabilistic inference , 2010, Artif. Intell..

[24]  María Fuentes Fort,et al.  A Flexible Multitask Summarizer for Documents from Different Media, Domain and Language , 2008 .

[25]  Ulrich Heid,et al.  Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control , 2008, IJCNLP.

[26]  Christian Chiarcos,et al.  OLiA - Ontologies of Linguistic Annotation , 2015, Semantic Web.

[27]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[28]  Amália Mendes,et al.  Connective-Lex: A Web-Based Multilingual Lexical Resource for Connectives , 2019, Discours.

[29]  Christian Chiarcos Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation , 2014, LREC.

[30]  Alex Lascarides,et al.  Edinburgh Research Explorer Using automatically labelled examples to classify rhetorical relations: an assessment , 2022 .

[31]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[32]  Andreas Witt,et al.  GOLD and Discourse: Domain- and Community-Specific Extensions , 2005 .

[33]  Christian Chiarcos,et al.  Modelling Frequency and Attestations for OntoLex-Lemon , 2020, GLOBALEX@LREC.

[34]  Johan Bos,et al.  The First Shared Task on Discourse Representation Structure Parsing , 2019, Proceedings of the IWCS Shared Task on Semantic Parsing.

[35]  Jirí Mírovský,et al.  Explicit and Implicit Discourse Relations in the Prague Discourse Treebank , 2019, TSD.

[36]  Johan Bos,et al.  Open-Domain Semantic Parsing with Boxer , 2015, NODALIDA.

[37]  Besim Kabashi,et al.  Results of the Translation Inference Across Dictionaries 2019 Shared Task , 2019, TIAD@LDK.

[38]  Elisabetta Jezek,et al.  LICO: A Lexicon of Italian Connectives , 2016, CLiC-it/EVALITA.

[39]  Christian Chiarcos,et al.  The ACoLi Dictionary Graph , 2020, LREC.

[40]  Rashmi Prasad,et al.  The Hindi Discourse Relation Bank , 2009, Linguistic Annotation Workshop.

[41]  Laurence Danlos,et al.  LEXCONN: A French Lexicon of Discourse Connectives , 2010 .