A comprehensive review on feature set used for anaphora resolution

In linguistics, the Anaphora Resolution (AR) is the method of identifying the antecedent for anaphora. In simple terms, this is the problem that helps to solve what the expression referring to a referent refers to. It is considered to be one of the tedious tasks in Natural Language Processing (NLP). AR’s burgeoning popularity among researchers is attributable to its strong relevance to machine translation, text summarization, chatbot, question answering, and many others. This paper presents a review of AR approaches based on significant features utilized to perform this task and presents the evaluation metrics for this field. The feature is a relevant term related to AR that provides vital information regarding anaphor, antecedent, and relation between them. In this context, features represent the lexical, syntactical, semantical, and positional relationship between anaphor and its possible candidate antecedent. The performance of the Anaphora resolution system is profoundly dependent on the features used in the AR system. Hence, the selection of features for the AR system is highly significant. The main emphasis is to provide an overview of the various features needed to extract both the Anaphora and the Antecedent, respectively, used in different AR systems, present in literature. It is observed that syntactical information enhances the correctness of determining the properties for the existence of an anaphor and antecedent identification. Nowadays the trend is changing from hand-crafted feature dependent methods to deep learning approaches which try to learn feature representation. The performance of deep learning is progressing due to the accessibility of additional data and more powerful computing resources. This survey will provide the state-of art for the better understanding of solving AR problem from the feature selection perspective. The findings of this survey are useful to provide valuable insight into present trends and are helpful for researchers who are looking for developing AR system within given constraints.

[1]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[2]  Rakesh Chada,et al.  Gendered Pronoun Resolution using BERT and an Extractive Question Answering Formulation , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[3]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[4]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[5]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[6]  Shengwei Tian,et al.  Multi-Attention-Based Capsule Network for Uyghur Personal Pronouns Resolution , 2020, IEEE Access.

[7]  Branimir Boguraev,et al.  Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser , 1996, COLING.

[8]  Caroline Gasperin,et al.  Semi-supervised anaphora resolution in biomedical texts , 2006, BioNLP@NAACL-HLT.

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Michael Halliday,et al.  Cohesion in English , 1976 .

[11]  Maria Antònia Martí,et al.  AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan , 2010, Lang. Resour. Evaluation.

[12]  Han Zhang,et al.  A Hybrid Method of Coreference Resolution in Information Security , 2020, Computers, Materials & Continua.

[13]  Weinan Zhang,et al.  A Deep Neural Network for Chinese Zero Pronoun Resolution , 2016, IJCAI.

[14]  Herbert H. Clark,et al.  Bridging , 1975, TINLAP.

[15]  Asif Ekbal,et al.  Differential evolution-based feature selection technique for anaphora resolution , 2015, Soft Comput..

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.

[18]  Ruslan Mitkov,et al.  Towards a more consistent and comprehensive evaluation of anaphora resolution algorithms and systems , 2001, Appl. Artif. Intell..

[19]  Gordana Ilic Holen Automatic Anaphora Resolution for Norwegian (ARN) , 2007, DAARC.

[20]  Erhard W. Hinrichs,et al.  A Unified Representation for Morphological, Syntactic, Semantic, and Referential Annotations , 2005, FCA@ACL.

[21]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[22]  Andreas Vlachos,et al.  Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain , 2006, BioNLP@NAACL-HLT.

[23]  Paloma Martínez,et al.  DrugNerAR: linguistic rule-based anaphora resolver for drug-drug interaction extraction in pharmacological documents , 2009, DTMBIO.

[24]  Marilyn A. Walker,et al.  Evaluating Discourse Processing Algorithms , 1989, ACL.

[25]  Rafael Muñoz,et al.  An Algorithm for Anaphora Resolution in Spanish Texts , 2001, CL.

[26]  Yuji Matsumoto,et al.  Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations , 2007, LAW@ACL.

[27]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[28]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[29]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[30]  Yilmaz Kiliçaslan,et al.  A Computational Model for Resolving Pronominal Anaphora in Turkish Using Hobbs' Naïve Algorithm , 2005, WEC.

[31]  Saroj Kaushik,et al.  Machine Learning Approach for the Classification of Demonstrative Pronouns for Indirect Anaphora in Hindi News Items , 2011, Prague Bull. Math. Linguistics.

[32]  Rafael Dueire Lins,et al.  Automatic cohesive summarization with pronominal anaphora resolution , 2018, Comput. Speech Lang..

[33]  Ted Briscoe,et al.  Natural Language Processing in aid of FlyBase curators , 2008, BMC Bioinformatics.

[34]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[35]  Jaime G. Carbonell,et al.  Anaphora Resolution: A Multi-Strategy Approach , 1988, COLING.

[36]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[37]  Jackie Chi Kit Cheung,et al.  The KnowRef Coreference Corpus: Removing Gender and Number Cues for Difficult Pronominal Anaphora Resolution , 2018, ACL.

[38]  Changki Lee,et al.  Anaphora resolution with pointer networks , 2017, Pattern Recognit. Lett..

[39]  Yu Cheng,et al.  Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[40]  Constantin Orasan,et al.  CAST: A computer-aided summarisation tool , 2003, EACL.

[41]  Stefanie Dipper,et al.  Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey , 2018, Computational Linguistics.

[42]  G. Chamorro Offline interpretation of subject pronouns by native speakers of Spanish , 2018 .

[43]  Dan Klein,et al.  Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints , 2016, ACL.

[44]  Breck Baldwin,et al.  CogNIAC: high precision coreference with limited knowledge and linguistic resources , 1997 .

[45]  Yu Zhang,et al.  Deep Reinforcement Learning for Chinese Zero Pronoun Resolution , 2018, ACL.

[46]  Michael Strube,et al.  A Machine Learning Approach to Pronoun Resolution in Spoken Dialogue , 2003, ACL.

[47]  Basilio Sierra,et al.  A First Machine Learning Approach to Pronominal Anaphora Resolution in Basque , 2010, IBERAMIA.

[48]  Ting Liu,et al.  Generating and Exploiting Large-scale Pseudo Training Data for Zero Pronoun Resolution , 2016, ACL.

[49]  Eduard H. Hovy,et al.  BLANC: Implementing the Rand index for coreference evaluation , 2010, Natural Language Engineering.

[50]  Paul Schmidt,et al.  On the Complexity of Pronominal Anaphora Resolution in Machine Translation , 1998 .

[51]  Edith Bolling Anaphora Resolution , 2006 .

[52]  Tyne Liang,et al.  Automatic Pronominal Anaphora Resolution in English Texts , 2003, ROCLING.

[53]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[54]  Junlin Yang,et al.  Look Again at the Syntax: Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[55]  Martin Romacker,et al.  MedSynDikate - a natural language system for the extraction of medical information from findings reports , 2002, Int. J. Medical Informatics.

[56]  Ani Thomas,et al.  Heuristic Algorithm for Resolving Pronominal Anaphora in Hindi Dialects , 2020 .

[57]  Uma Sharma,et al.  To Reduce the Multidimensionality of Feature Set for Anaphora Resolution Algorithm , 2018 .

[58]  Richard Evans,et al.  A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method , 2002, CICLing.

[59]  Martha Palmer,et al.  Korean zero pronouns: analysis and resolution , 2006 .

[60]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[61]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[62]  Ruslan Mitkov,et al.  Shallow Language Processing Architecture for Bulgarian , 2002, COLING.

[63]  Xiao Ma,et al.  Enhancing Attention-Based LSTM With Position Context for Aspect-Level Sentiment Classification , 2019, IEEE Access.

[64]  Beth Sundheim,et al.  Overview of the Fourth Message Understanding Evaluation and Conference , 1992, MUC.

[65]  Asif Ekbal,et al.  Multiobjective Simulated Annealing Based Approach for Feature Selection in Anaphora Resolution , 2011, DAARC.

[66]  Joel R. Tetreault Analysis of Syntax-Based Pronoun Resolution Methods , 1999, ACL.

[67]  R. Nithya Need for Anaphoric Resolution towards Sentiment Analysis-A Case Study with Scarlet Pimpernel (Novel) , 2019 .

[68]  Leora Morgenstern,et al.  The First Winograd Schema Challenge at IJCAI-16 , 2017, AI Mag..

[69]  Stefanie Dipper,et al.  Survey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey , 2018, CL.

[70]  Tanmoy Chakraborty,et al.  CQASUMM: Building References for Community Question Answering Summarization Corpora , 2018, COMAD/CODS.

[71]  Dan Roth,et al.  Solving Hard Coreference Problems , 2019, NAACL.

[72]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[73]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[74]  Ted Briscoe,et al.  Statistical Anaphora Resolution in Biomedical Texts , 2008, COLING.

[75]  Asif Ekbal,et al.  Feature Selection in Anaphora Resolution for Bengali: A Multiobjective Approach , 2015, CICLing.

[76]  Yejin Choi,et al.  WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.

[77]  Satoru Ikehara,et al.  Zero Pronoun Resolution in a Japanese to English Machine Translation System using Verbal Semantic Attributes , 1993 .

[78]  Geoffrey Leech,et al.  Running a grammar factory: The production of syntactically analysed corpora or “treebanks” , 1991 .

[79]  Antonio Ferrandez,et al.  A Computational Approach to Zero-pronouns in Spanish , 2000, ACL 2000.

[80]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[81]  Megumi Kameyama,et al.  Recognizing referential links: an information extraction prespective , 1997, ArXiv.

[82]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[83]  Peter Lavin,et al.  Towards Evaluating the Impact of Anaphora Resolution on Text Summarisation from a Human Perspective , 2016, NLDB.

[84]  José Luis Vicedo González,et al.  Importance of Pronominal Anaphora Resolution in Question Answering Systems , 2000, ACL.

[85]  Abeer Alsadoon,et al.  Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review , 2019, Expert Syst. Appl..

[86]  Carl Pollard,et al.  A Centering Approach to Pronouns , 1987, ACL.

[87]  Wai-Kiang Yeap,et al.  Pronominal Anaphora Resolution Using a Shallow Meaning Representation of Sentences , 2004, PRICAI.

[88]  Karel Jezek,et al.  Two uses of anaphora resolution in summarization , 2007, Inf. Process. Manag..

[89]  James W. Cooper,et al.  Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information , 2005, BMC Bioinformatics.

[90]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[91]  Erik Cambria,et al.  Sentic LSTM: a Hybrid Network for Targeted Aspect-Based Sentiment Analysis , 2018, Cognitive Computation.

[92]  Nazlia Omar,et al.  A Hybrid Approach to Pronominal Anaphora Resolution in Arabic , 2015, J. Comput. Sci..

[93]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[94]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[95]  Ruslan Mitkov,et al.  Introduction: Special Issue on Anaphora Resolution in Machine Translation and Multilingual NLP , 1999, Machine Translation.

[96]  Ge Yu,et al.  Sentiment analysis using deep learning approaches: an overview , 2019, Science China Information Sciences.

[97]  Halil Kilicoglu,et al.  Sortal anaphora resolution to enhance relation extraction from biomedical literature , 2016, BMC Bioinformatics.

[98]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[99]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[100]  Pushpak Bhattacharyya,et al.  Identifying Participant Mentions and Resolving Their Coreferences in Legal Court Judgements , 2018, TSD.

[101]  Luke S. Zettlemoyer,et al.  The Referential Reader: A Recurrent Entity Network for Anaphora Resolution , 2019, ACL.

[102]  Roland Stuckardt,et al.  Towards a Procedure Model for Developing Anaphora Processing Applications , 2016, Anaphora Resolution - Algorithms, Resources, and Applications.

[103]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[104]  M. Walker,et al.  Centering Theory in Discourse , 1998 .

[105]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[106]  Erik Cambria,et al.  Anaphora and Coreference Resolution: A Review , 2018, Inf. Fusion.

[107]  Jason Baldridge,et al.  Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns , 2018, TACL.

[108]  Ruslan Mitkov,et al.  COMPARING PRONOUN RESOLUTION ALGORITHMS , 2007, Comput. Intell..

[109]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[110]  Niloy Ganguly,et al.  Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews , 2020, SIGIR.

[111]  Josef Steinberger,et al.  Improving LSA-based Summarization with Anaphora Resolution , 2005, HLT.

[112]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[113]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[114]  Elaine Rich,et al.  An Architecture for Anaphora Resolution , 1988, ANLP.

[115]  Saroj Kaushik,et al.  Application of Pronominal Divergence and Anaphora Resolution in English-Hindi Machine Translation , 2009, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[116]  Hao Yu,et al.  A quantitative benefit evaluation of code search platform for enterprises , 2020, Science China Information Sciences.

[117]  Rabiah Abdul Kadir,et al.  Named Entity Enrichment Based on Subject-Object Anaphora Resolution , 2019 .

[118]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[119]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[120]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[121]  Manfred Stede,et al.  SUMMaR: Combining Linguistics and Statistics for Text Summarization , 2006, ECAI.

[122]  Min Yang,et al.  Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications , 2019, ACL.

[123]  Yannick Versley,et al.  Anaphora Resolution , 2016, Theory and Applications of Natural Language Processing.