Detecting Domain-Specific Ambiguities: An NLP Approach Based on Wikipedia Crawling and Word Embeddings

In the software process, unresolved natural language (NL) ambiguities in the early requirements phases may cause problems in later stages of development. Although methods exist to detect domain-independent ambiguities, ambiguities are also influenced by the domain-specific background of the stakeholders involved in the requirements process. In this paper, we aim to estimate the degree of ambiguity of typical computer science words (e.g., system, database, interface) when used in different application domains. To this end, we apply a natural language processing (NLP) approach based on Wikipedia crawling and word embeddings, a novel technique to represent the meaning of words through compact numerical vectors. Our preliminary experiments, performed on five different domains, show promising results. The approach allows an estimate of the variation of meaning of the computer science words when used in different domains. Further validation of the method will indicate the words that need to be carefully defined in advance by the requirements analyst to avoid misunderstandings when editing documents and dealing with experts in the considered domains.

[1]  Annie I. Antón,et al.  Identifying and classifying ambiguity for regulatory requirements , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[2]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[3]  Jane Cleland-Huang,et al.  Mining Domain Knowledge [Requirements] , 2015, IEEE Software.

[4]  Roel Wieringa,et al.  Naming the pain in requirements engineering , 2016, Empirical Software Engineering.

[5]  J. Firth,et al.  Selected papers of J. R. Firth, 1952-59 , 1968 .

[6]  Benedikt Gleich,et al.  Ambiguity Detection: Towards a Tool Explaining Ambiguity Sources , 2010, REFSQ.

[7]  Francis Chantree,et al.  Identifying Nocuous Ambiguities in Natural Language Requirements , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[8]  Erik Kamsties,et al.  From Contract Drafting to Software Specification: Linguistic Sources of Ambiguity , 2003 .

[9]  Bashar Nuseibeh,et al.  Analysing anaphoric ambiguity in natural language requirements , 2011, Requirements Engineering.

[10]  Didar Zowghi,et al.  Requirements Elicitation: A Survey of Techniques, Approaches, and Tools , 2005 .

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Erik Kamsties,et al.  Ambiguity in Requirements Specification , 2004 .

[13]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[14]  Erik Kamsties,et al.  The Syntactically Dangerous All and Plural in Specifications , 2005, IEEE Softw..

[15]  Henning Femmer,et al.  On the impact of passive voice requirements on domain modelling , 2014, ESEM '14.

[16]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[17]  Stefania Gnesi,et al.  Using collective intelligence to detect pragmatic ambiguities , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[18]  Stefan Wagner,et al.  Rapid quality assurance with Requirements Smells , 2016, J. Syst. Softw..

[19]  Stefania Gnesi,et al.  An Automatic Quality Evaluation for Natural Language Requirements , 2001 .

[20]  Daniel M. Berry,et al.  The Design of SREE - A Prototype Potential Ambiguity Finder for Requirements Specifications and Lessons Learned , 2013, REFSQ.

[21]  N. Shadbolt,et al.  Eliciting Knowledge from Experts: A Methodological Analysis , 1995 .

[22]  Stefania Gnesi,et al.  Ambiguity as a resource to disclose tacit knowledge , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[23]  Stefania Gnesi,et al.  Using NLP to Detect Requirements Defects: An Industrial Experience in the Railway Domain , 2017, REFSQ.