Identifier les relations discursives implicites en combinant données naturelles et données artificielles

Cet article presente les premieres experiences sur le francais d'identification automatique des relations discursives implicites (i.e., non marquees par un connecteur). Nos systemes exploitent des exemples implicites annotes, ainsi que des exemples implicites artificiels obtenus a partir d'exemples explicites par suppression du connecteur, une methode introduite par Marcu et Echihabi (2002). Les precedentes etudes sur l'anglais montrent que l'utilisation a l'entrainement des donnees artificielles degrade largement les performances sur les donnees naturelles, ce qui reflete des differences importantes en termes de distribution. Ce constat, qui tient aussi pour le francais, nous a amenes a envisager differentes methodes, inspirees de l'adaptation de domaine, visant a combiner plus efficacement les donnees. Nous evaluons ces methodes sur le corpus ANNODIS : notre meilleur systeme obtient 41,7 % d'exactitude, soit un gain significatif de 4,4 % par rapport a un modele n'utilisant que les donnees naturelles.

[1]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[2]  Xun Wang,et al.  Implicit Discourse Relation Recognition by Selecting Typical Training Examples , 2012, COLING.

[3]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[4]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[5]  Daumé,et al.  Frustratingly Easy Semi-Supervised Domain Adaptation , 2010 .

[6]  J. Winter Practical Assessment, Research, and Evaluation Practical Assessment, Research, and Evaluation Using the Student's t-test with extremely small sample sizes Using the Student's t-test with extremely small sample sizes , 2022 .

[7]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[8]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[9]  Hwee Tou Ng,et al.  Recognizing Implicit Discourse Relations in the Penn Discourse Treebank , 2009, EMNLP.

[10]  Anders Søgaard,et al.  Semi-Supervised Learning and Domain Adaptation in Natural Language Processing , 2013, Semi-Supervised Learning and Domain Adaptation in Natural Language Processing.

[11]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[12]  Vera Demberg,et al.  Implicitness of Discourse Relations , 2012, COLING.

[13]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[14]  Hwee Tou Ng,et al.  A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[15]  Alex Lascarides,et al.  Edinburgh Research Explorer Using automatically labelled examples to classify rhetorical relations: an assessment , 2022 .

[16]  Owen Rambow,et al.  Building and Refining Rhetorical-Semantic Relation Models , 2007, HLT-NAACL.

[17]  Alex Lascarides,et al.  Exploiting Linguistic Cues to Classify Rhetorical Relations , 2005 .

[18]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[19]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[20]  Kenji Sagae,et al.  Analysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing , 2009, IWPT.

[21]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[22]  Charlotte Roze Vers une algèbre des relations de discours , 2013 .

[23]  G. Miller,et al.  Cognitive science. , 1981, Science.

[24]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[25]  Pascal Denis,et al.  Constrained Decoding for Text-Level Discourse Parsing , 2012, COLING.

[26]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[27]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[28]  Tobias Scheffer,et al.  Error Estimation and Model Selection , 1999, Künstliche Intell..

[29]  Claudia Soria,et al.  Lexical marking of discourse relations - some experimental findings , 1998, COLING 1998.

[30]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.

[31]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[32]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[33]  Franck Thollard,et al.  Proceedings of COLING , 2004 .

[34]  Ludovic Tanguy,et al.  An empirical resource for discovering cognitive principles of discourse organisation: the ANNODIS corpus , 2012, LREC.

[35]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[36]  Connecteurs, encodage conceptuel et encodage procédural , 2002 .

[37]  Claire Cardie,et al.  Improving Implicit Discourse Relation Recognition Through Feature Set Optimization , 2012, SIGDIAL Conference.

[38]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..