Machine Learning for Detecting Pronominal Anaphora Ambiguity in NL Requirements

Automated or semi-automated analysis of requirements specification documents, expressed in Natural Language (NL), has always been desirable. An important precursor to this goal is the identification and correction of potentially ambiguous requirements statements. Pronominal Anaphora ambiguity is one such type of pragmatic or referential ambiguity in NL requirements, which needs attention. However, identification of such ambiguous requirements statements is a challenging task since the count of such statements is relatively lower. We present a solution to this challenge by considering the task as that of a classification problem to classify ambiguous requirements statements having pronominal anaphora ambiguity from a corpus of potentially ambiguous requirements statements with pronominal anaphora ambiguity. We show how a classifier can be trained in semi-supervised manner to detect such instances of pronominal anaphoric ambiguous requirements statements. Our study indicates a recall of 95% with Bayesian network classification algorithm.

[1]  Tim Menzies,et al.  Practical Machine Learning for Software Engineering and Knowledge Engineering , 2000 .

[2]  Mariano Ceccato,et al.  Ambiguity Identification and Measurement in Natural Language Texts , 2004 .

[3]  Tyne Liang,et al.  Automatic Pronominal Anaphora Resolution in English Texts , 2003, ROCLING.

[4]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[5]  Erik Kamsties,et al.  From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity , 2003 .

[6]  Bashar Nuseibeh,et al.  Analysing anaphoric ambiguity in natural language requirements , 2011, Requirements Engineering.

[7]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[8]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[9]  Francis Chantree,et al.  Identifying Nocuous Ambiguities in Natural Language Requirements , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[10]  Yuji Matsumoto,et al.  Anaphora resolution by antecedent identification followed by anaphoricity determination , 2005, TALIP.

[11]  Bojan Cukic,et al.  Predicting more from less: Synergies of learning , 2013, 2013 2nd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE).

[12]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[13]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[14]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[15]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[16]  Erik Kamsties,et al.  Surfacing ambiguity in natural language requirements , 2001 .