Topic Cohesion Preserving Requirements Clustering

This paper focuses on the problem of generating human interpretable clusters of semantically related plain-text requirements. Presented approach applies techniques from information retrieval, natural language processing, network analysis, and machine learning for identifying semantically central terms as themes and clustering requirements into semantically coherent groups together with meaningful explanatory themes associated with the clusters to assist in user comprehension of the clusters. Presented approach is generic in nature and can be used for other phases of SDLC (Software Development Life Cycle) including code-comprehension and architectural discovery. Suggested approach is particularly suitable for developing automated tool support for requirements management and analysis.

[1]  Norbert Seyff,et al.  Automatic Analysis of Multimodal Requirements: A Research Preview , 2012, REFSQ.

[2]  Stefania Gnesi,et al.  Using Clustering to Improve the Structure of Natural Language Requirements Documents , 2013, REFSQ.

[3]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[5]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[6]  J. Fleiss,et al.  The measurement of interrater agreement , 2004 .

[7]  Elisa L. A. Baniassad,et al.  Isolating and relating concerns in requirements using latent semantic analysis , 2006, OOPSLA '06.

[8]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[9]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[10]  Stefania Gnesi,et al.  A clustering-based approach for discovering flaws in requirements specifications , 2012, SAC '12.

[11]  Diana Maynard,et al.  NLP Techniques for Term Extraction and Ontology Population , 2008, Ontology Learning and Population.

[12]  Shubhashis Sengupta,et al.  Automatic extraction of glossary terms from natural language requirements , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[13]  RJ Rama Sree PARTS-OF-SPEECH TAGGING: , 2011 .

[14]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[15]  Alex Kass,et al.  Requirements Analysis Tool: A Tool for Automatically Analyzing Software Requirements Documents , 2008, SEMWEB.

[16]  Elisa Baniassad Finding Aspects In Requirements with Theme/Doc , 2004 .

[17]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[18]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[19]  George Karypis,et al.  Document Clustering: The Next Frontier , 2018, Data Clustering: Algorithms and Applications.

[20]  Colin J. Neill,et al.  Requirements Engineering: The State of the Practice , 2003, IEEE Softw..

[21]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[22]  K. M. Annervaz,et al.  Software Clustering: Unifying Syntactic and Semantic Features , 2012, 2012 19th Working Conference on Reverse Engineering.

[23]  John T. Stasko,et al.  iVisClustering: An Interactive Visual Document Clustering via Topic Modeling , 2012, Comput. Graph. Forum.

[24]  Wayne Wilson Writing Effective Natural Language Requirements Specifications , 1999 .

[25]  Xiaoming Jin,et al.  Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing , 2006, DEXA.

[26]  Chuan Duan,et al.  A Clustering Technique for Early Detection of Dominant and Recessive Cross-Cutting Concerns , 2007, Early Aspects at ICSE: Workshops in Aspect-Oriented Requirements Engineering and Architecture Design (EARLYASPECTS'07).

[27]  Janardan Misra,et al.  Terminological inconsistency analysis of natural language requirements , 2016, Inf. Softw. Technol..

[28]  Ralph Young,et al.  The requirements engineering handbook , 2003 .

[29]  Aurora Vizcaíno,et al.  Requirements engineering tools: Capabilities, survey and assessment , 2012, Inf. Softw. Technol..

[30]  Shubhashis Sengupta,et al.  Latent semantic centrality based automated requirements prioritization , 2014, ISEC '14.

[31]  Jane Cleland-Huang,et al.  Towards automated requirements prioritization and triage , 2009, Requirements Engineering.

[32]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[33]  C. Elkan,et al.  Topic Models , 2008 .

[34]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[35]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[36]  Giuseppe Lami QuARS: A Tool for Analyzing Requirements , 2005 .

[37]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[38]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[39]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[40]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[41]  Ian Sommerville,et al.  Requirements Engineering: Processes and Techniques , 1998 .