A First Machine Learning Approach to Pronominal Anaphora Resolution in Basque

In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI, in order to make a general comparison. We determine the feature vector obtained with our linguistic processing system and we analyze the contribution of different subsets of features, as well as the weight of each feature used in the task.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Yannick Versley,et al.  Coreference Systems Based on Kernels Methods , 2008, COLING.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Graeme Hirst,et al.  Anaphora in Natural Language Understanding: A Survey , 1981, Lecture Notes in Computer Science.

[5]  Gholamreza Ghassem-Sani,et al.  A Ranking Approach to Persian Pronoun Resolution , 2009 .

[6]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[7]  H. Altay Güvenir,et al.  Classification by Voting Feature Intervals , 1997, ECML.

[8]  Ruslan Mitkov,et al.  Automatic Anaphora Resolution: Limits, Impediments, and Ways Forward , 2002, PorTAL.

[9]  I. Kononenko,et al.  Attribute Selection for Modeling , 1997 .

[10]  Itziar Aduriz,et al.  Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing , 2006 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Yannick Versley A Constraint-based Approach to Noun Phrase Coreference Resolution in German Newspaper Text , 2006 .

[13]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[14]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[15]  Igor Kononenko,et al.  Attribute selection for modelling , 1997, Future Gener. Comput. Syst..

[16]  Graeme Hirst Human Language Technology , 2006 .

[17]  Itziar Aduriz,et al.  A Cascaded Syntactic Analyser for Basque , 2004, CICLing.

[18]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[19]  Itziar Laka Mugarza A Brief Grammar of Euskara, the Basque Language , 1996 .

[20]  Yõlmaz Kõlõçaslan,et al.  Pronoun Resolution in Turkish Using Decision Tree and Rule-Based Learning Algorithms , 2009 .

[21]  FBiG 3LB: Construcción de una base de datos de árboles sintáctico semánticos. , 2003 .

[22]  Marc Moens,et al.  Seventh Message Understanding Conference (MUC-7) , 1998 .