Using Domain-Specific Corpora for Improved Handling of Ambiguity in Requirements

Ambiguity in natural-language requirements is a pervasive issue that has been studied by the requirements engineering community for more than two decades. A fully manual approach for addressing ambiguity in requirements is tedious and time-consuming, and may further overlook unacknowledged ambiguity – the situation where different stakeholders perceive a requirement as unambiguous but, in reality, interpret the requirement differently. In this paper, we propose an automated approach that uses natural language processing for handling ambiguity in requirements. Our approach is based on the automatic generation of a domain-specific corpus from Wikipedia. Integrating domain knowledge, as we show in our evaluation, leads to a significant positive improvement in the accuracy of ambiguity detection and interpretation. We scope our work to coordination ambiguity (CA) and prepositional-phrase attachment ambiguity (PAA) because of the prevalence of these types of ambiguity in natural-language requirements [1]. We evaluate our approach on 20 industrial requirements documents. These documents collectively contain more than 5000 requirements from seven distinct application domains. Over this dataset, our approach detects CA and PAA with an average precision of 80% and an average recall of 89% (90% for cases of unacknowledged ambiguity). The automatic interpretations that our approach yields have an average accuracy of 85%. Compared to baselines that use generic corpora, our approach, which uses domain-specific corpora, has 33% better accuracy in ambiguity detection and 16% better accuracy in interpretation.

[1]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  Fabiano Dalpiaz,et al.  Pinpointing Ambiguity and Incompleteness in Requirements Engineering via Information Visualization and NLP , 2018, REFSQ.

[4]  Annie I. Antón,et al.  Identifying and classifying ambiguity for regulatory requirements , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[5]  Stefania Gnesi,et al.  The linguistic approach to the natural language requirements quality: benefit of the use of an automatic tool , 2001, Proceedings 26th Annual NASA Goddard Software Engineering Workshop.

[6]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[7]  Benedikt Gleich,et al.  Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources , 2010, REFSQ.

[8]  A. Kilgarriff,et al.  Disambiguating coordinations using word distribution information , 2005 .

[9]  Luisa Mich,et al.  NL-OOPS: from natural language to object oriented requirements using the natural language processing system LOLITA , 1996, Natural Language Engineering.

[10]  A. Kilgarriff,et al.  Thesauruses for natural language processing , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[11]  Muneera Bano,et al.  Interview Review: An Empirical Study on Detecting Ambiguities in Requirements Elicitation Interviews , 2018, REFSQ.

[12]  Didar Zowghi,et al.  Ambiguity in Requirements Engineering: Towards a Unifying Framework , 2019, From Software Engineering to Formal Methods and Tools, and Back.

[13]  Alistair Mavin,et al.  Easy Approach to Requirements Syntax (EARS) , 2009, 2009 17th IEEE International Requirements Engineering Conference.

[14]  Luisa Mich,et al.  Requirements for tools for ambiguity identification and measurement in natural language requirements specifications , 2008, Requirements Engineering.

[15]  Vaibhav Jain,et al.  Cross-Domain Ambiguity Detection using Linear Transformation of Word Embedding Spaces , 2020, REFSQ Workshops.

[16]  Mehrdad Sabetzadeh,et al.  Automated Extraction and Clustering of Requirements Glossary Terms , 2017, IEEE Transactions on Software Engineering.

[17]  Barbara Paech,et al.  Detecting Ambiguities in Requirements Documents Using Inspections , 2001 .

[18]  Angela Fogarolli,et al.  Word Sense Disambiguation Based on Wikipedia Link Structure , 2009, 2009 IEEE International Conference on Semantic Computing.

[19]  Doris L. Carver,et al.  An efficient wikipedia-based approach for better understanding of natural language text related to user requirements , 2018, 2018 IEEE Aerospace Conference.

[20]  Davide Dell'Anna,et al.  Requirements Classification with Interpretable Machine Learning and Dependency Parsing , 2019, 2019 IEEE 27th International Requirements Engineering Conference (RE).

[21]  Francis Chantree,et al.  Identifying Nocuous Ambiguities in Natural Language Requirements , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[22]  Carlo Strapparava,et al.  Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources , 2014, LREC.

[23]  A. Kilgarriff,et al.  Detecting dangerous coordination ambiguities using word distribution , 2007 .

[24]  Mehrdad Sabetzadeh,et al.  Automated Checking of Conformance to Requirements Templates Using Natural Language Processing , 2015, IEEE Transactions on Software Engineering.

[25]  Daniel M. Berry,et al.  The Design of SREE - A Prototype Potential Ambiguity Finder for Requirements Specifications and Lessons Learned , 2013, REFSQ.

[26]  Patrick Pantel,et al.  An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words , 2000, ACL.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  Steven T. Piantadosi,et al.  The communicative function of ambiguity in language , 2011, Cognition.

[29]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[30]  Bashar Nuseibeh,et al.  Automatic detection of nocuous coordination ambiguities in natural language requirements , 2010, ASE '10.

[31]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[32]  Arpit Sharma,et al.  On the Use of Word Embeddings for Identifying Domain Specific Ambiguities in Requirements , 2019, 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW).

[33]  WeikumGerhard Foundations of statistical natural language processing , 2002 .

[34]  Sjaak Brinkkemper,et al.  Detecting terminological ambiguity in user stories: Tool and experimentation , 2019, Inf. Softw. Technol..

[35]  Christian Wartena,et al.  Using Word Embeddings for Unsupervised Acronym Disambiguation , 2018, COLING.

[36]  Fabian de Bruijn,et al.  Ambiguity in Natural Language Software Requirements: A Case Study , 2010, REFSQ.

[37]  Yue Zhang,et al.  Fast and Accurate Shift-Reduce Constituent Parsing , 2013, ACL.

[38]  Devesh C. Jinwala,et al.  Resolving Ambiguities in Natural Language Software Requirements: A Comprehensive Survey , 2015, SOEN.

[39]  Cristina Ribeiro,et al.  The prevalence and severity of persistent ambiguity in software requirements specifications: Is a special effort needed to find them? , 2020, Science of Computer Programming.

[40]  Stefan Evert,et al.  Scalable Construction of High-Quality Web Corpora , 2013, J. Lang. Technol. Comput. Linguistics.

[41]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[42]  Kenneth Ward Church,et al.  Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table , 1982, CL.

[43]  Daniel M. Berry,et al.  Can Rules of Inferences Resolve Coordination Ambiguity in Natural Language Requirements Specification? , 2008, WER.

[44]  Piek T. J. M. Vossen,et al.  SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain , 2009, *SEMEVAL.

[45]  Jianwei Niu,et al.  Disambiguating Requirements Through Syntax-Driven Semantic Analysis of Information Types , 2020, REFSQ.

[46]  Fernanda Ferreira,et al.  Processing Coordination Ambiguity , 2010, Language and speech.

[47]  Stefan Evert,et al.  Google Web 1T 5-Grams Made Easy (but not for the computer) , 2010, WAC@NAACL-HLT.

[48]  Gianluca Trentanni,et al.  QuARS: A Pioneer Tool for NL Requirement Analysis , 2019, From Software Engineering to Formal Methods and Tools, and Back.

[49]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[50]  Stefania Gnesi,et al.  Detecting requirements defects with NLP patterns: an industrial experience in the railway domain , 2018, Empirical Software Engineering.

[51]  Vincenzo Gervasi,et al.  On the Systematic Analysis of Natural Language Requirements with CIRCE , 2006, Automated Software Engineering.

[52]  Tobias Hawker USYD: WSD and Lexical Substitution using the Web1T corpus , 2007, SemEval@ACL.

[53]  Mirella Lapata,et al.  Measuring Distributional Similarity in Context , 2010, EMNLP.

[54]  David Lo,et al.  A comparative study on the effectiveness of part-of-speech tagging techniques on bug reports , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[55]  Andrea Esuli,et al.  An NLP approach for cross-domain ambiguity detection in requirements engineering , 2019, Automated Software Engineering.

[56]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[57]  Daniel Toews,et al.  Determining Domain-Specific Differences of Polysemous Words Using Context Information , 2019, REFSQ Workshops.

[58]  Jason S. Chang,et al.  WriteAhead: Mining Grammar Patterns in Corpora for Assisted Writing , 2015, ACL.

[59]  Erik Kamsties,et al.  Taming Ambiguity in Natural Language Requirements , 2005 .

[60]  Bashar Nuseibeh,et al.  Analysing anaphoric ambiguity in natural language requirements , 2011, Requirements Engineering.

[61]  Erik Kamsties,et al.  From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity , 2003 .

[62]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[63]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[64]  Preslav Nakov,et al.  Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution , 2005, HLT.

[65]  Stefania Gnesi,et al.  Using NLP to Detect Requirements Defects: An Industrial Experience in the Railway Domain , 2017, REFSQ.

[66]  Carson T. Schütze PP attachment and argumenthood , 1995 .

[67]  G. Leech 100 million words of English , 1993, English Today.

[68]  Bashar Nuseibeh,et al.  A Methodology for Automatic Identification of Nocuous Ambiguity , 2010, COLING.

[69]  Stefan Wagner,et al.  Rapid quality assurance with Requirements Smells , 2016, J. Syst. Softw..

[70]  Laurel J. Brinton,et al.  The Structure of Modern English: A linguistic introduction , 2000 .

[71]  Mehrdad Sabetzadeh,et al.  Extracting domain models from natural-language requirements: approach and industrial evaluation , 2016, MoDELS.

[72]  A. Willis,et al.  Automatic Identification of Nocuous Ambiguity , 2008 .

[73]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[74]  Alexander F. Gelbukh,et al.  Improving Prepositional Phrase Attachment Disambiguation Using the Web as Corpus , 2003, CIARP.

[75]  Miriam Goldberg,et al.  An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment , 1999, ACL.

[76]  Mehrdad Sabetzadeh,et al.  Automated Extraction of Semantic Legal Metadata using Natural Language Processing , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[77]  Kyoko Kanzaki,et al.  Advances in Natural Language Processing , 2012, Lecture Notes in Computer Science.

[78]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.