Improving requirements glossary construction via clustering: approach and industrial case studies

Context. A glossary is an important part of any software requirements document. By making explicit the technical terms in a domain and providing definitions for them, a glossary serves as a helpful tool for mitigating ambiguities. Goal. A necessary step for building a glossary is to decide upon the glossary terms and to identify their related terms. Doing so manually is a laborious task. Our objective is to provide automated support for identifying candidate glossary terms and their related terms. Our work differs from existing work on term extraction mainly in that, instead of providing a flat list of candidate terms, our approach clusters the terms by relevance. Method. We use case study research as the basis for our empirical investigation. Results. We present an automated approach for identifying and clustering candidate glossary terms. We evaluate the approach through two industrial case studies; one study concerns a satellite software component, and the other -- an evidence management tool for safety certification. Conclusions. Our results indicate that over requirements documents: (1) our approach is more accurate than other existing methods for identifying candidate glossary terms; this makes it less likely that our approach will miss important glossary terms. (2) Clustering provides an effective basis for grouping related terms; this makes clustering a useful support tool for selection of glossary terms and associating these terms with their related terms.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Daniel M. Berry,et al.  The use of a repeated phrase finder in requirements extraction , 1990, J. Syst. Softw..

[4]  Sridhar Radhakrishnan,et al.  INDEX: The statistical basis for an automatic conceptual phrase-indexing system , 1990, J. Am. Soc. Inf. Sci..

[5]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[6]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[7]  Craig Larman,et al.  Applying UML and patterns , 1997 .

[8]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[9]  David Yarowsky,et al.  Techniques in Speech Acoustics , 1999, Computational Linguistics.

[10]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[11]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[12]  Ralph Young,et al.  The requirements engineering handbook , 2003 .

[13]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[14]  Daniel M. Berry,et al.  AbstFinder, A Prototype Natural Language Text Abstraction Finder for Use in Requirements Elicitation , 1997, Automated Software Engineering.

[15]  A. Bernstein,et al.  SimPack: A Generic Java Library for Similarity Measures in Ontologies , 2005 .

[16]  Fabio Massimo Zanzotto,et al.  Terminology Extraction: An Analysis of Linguistic and Statistical Approaches , 2005 .

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  F. Scholz Maximum Likelihood Estimation , 2006 .

[19]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[20]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[21]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[22]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[23]  Klaus Pohl,et al.  Requirements Engineering - Fundamentals, Principles, and Techniques , 2010 .

[24]  Mehrdad Sabetzadeh,et al.  Matching and Merging of Variant Feature Specifications , 2012, IEEE Transactions on Software Engineering.

[25]  Shubhashis Sengupta,et al.  Automatic extraction of glossary terms from natural language requirements , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[26]  Mehrdad Sabetzadeh,et al.  Automatic Checking of Conformance to Requirement Boilerplates via Text Chunking: An Industrial Case Study , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[27]  Andreas Bollin,et al.  Requirements Engineering Fundamentals , 2015 .