Reverse Engineering Variability from Natural Language Documents: A Systematic Literature Review

Identifying features and their relations (i.e., variation points) is crucial in the process of migrating single software systems to software product lines (SPL). Various approaches have been proposed to perform feature extraction automatically from different artifacts, for instance, feature location in legacy code. Usually such approaches a) omit variability information and b) rely on artifacts that reside in advanced phases of the development process, thus, being only of limited usefulness in the context of SPLs. In contrast, feature and variability extraction from natural language (NL) documents is more favorable, because a mapping to several other artifacts is usually established from the very beginning. In this paper, we provide a multi-dimensional overview of approaches for feature and variability extraction from NL documents by means of a systematic literature review (SLR). We selected 25 primary studies and carefully evaluated them regarding different aspects such as techniques used, tool support, or accuracy of the results. In a nutshell, our key insights are that i) standard NLP techniques are commonly used, ii) post-processing often includes clustering & machine learning algorithms, iii) only in rare cases, the approaches support variability extraction, iv) tool support, apart from text pre-processing is often not available, and v) many approaches lack a comprehensive evaluation. Based on these observations, we derive future challenges, arguing that more effort need to be invested for making such approaches applicable in practice.

[1]  Jane Cleland-Huang,et al.  Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings , 2013, IEEE Transactions on Software Engineering.

[2]  Tony Gorschek,et al.  A systematic review of domain analysis solutions for product lines , 2009, J. Syst. Softw..

[3]  Haiyan Zhao,et al.  An approach to constructing feature models based on requirements clustering , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[4]  Bouchra El Asri,et al.  Detecting feature duplication in natural language specifications when evolving software product lines , 2015, 2015 International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE).

[5]  Zarinah Mohd Kasirun,et al.  Extracting features from online software reviews to aid requirements reuse , 2016, Appl. Soft Comput..

[6]  Paul Clements,et al.  Software product lines - practices and patterns , 2001, SEI series in software engineering.

[7]  Mathieu Acher,et al.  Automated extraction of product comparison matrices from informal product descriptions , 2017, J. Syst. Softw..

[8]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[9]  Zhendong Niu,et al.  A Systems Approach to Product Line Requirements Reuse , 2014, IEEE Systems Journal.

[10]  Pearl Brereton,et al.  Protocol for a Tertiary study of Systematic Literature Reviews and Evidence-based Guidelines in IT and Software Engineering , 2009 .

[11]  Yair Wand,et al.  Analyzing Variability of Software Product Lines Using Semantic and Ontological Considerations , 2014, CAiSE.

[12]  Iris Reinhartz-Berger,et al.  Generating feature models from requirements: structural vs. functional perspectives , 2014, SPLC '14.

[13]  Klaus Pohl,et al.  Software Product Line Engineering , 2005 .

[14]  Yair Wand,et al.  Variability Analysis of Requirements: Considering Behavioral Differences and Reflecting Stakeholders’ Perspectives , 2016, IEEE Transactions on Software Engineering.

[15]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[16]  Krzysztof Czarnecki,et al.  A survey of variability modeling in industrial practice , 2013, VaMoS.

[17]  Felice Dell'Orletta,et al.  Mining commonalities and variabilities from natural language documents , 2013, SPLC '13.

[18]  Claudio Riva,et al.  Experiences with software product family evolution , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[19]  Christoph Pohl,et al.  An Exploratory Study of Information Retrieval Techniques in Domain Analysis , 2008, 2008 12th International Software Product Line Conference.

[20]  Robert J. Walker,et al.  Recommending Features and Feature Relationships from Requirements Documents for Software Product Lines , 2015, 2015 IEEE/ACM 4th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering.

[21]  Zarinah Mohd Kasirun,et al.  Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review , 2015, J. Syst. Softw..

[22]  Ruzanna Chitchyan,et al.  A framework for constructing semantically composable feature models from natural language requirements , 2009, SPLC.

[23]  Hironori Washizaki,et al.  Supporting commonality and variability analysis of requirements and structural models , 2012, SPLC '12.

[24]  Nadia Bouassida,et al.  Mining Feature Models from Functional Requirements , 2016, Comput. J..

[25]  Gang Yin,et al.  Mining and recommending software features across multiple web repositories , 2013, Internetware.

[26]  Gunter Saake,et al.  Feature-Oriented Software Product Lines , 2013, Springer Berlin Heidelberg.

[27]  Charles W. Krueger,et al.  Easing the Transition to Software Mass Customization , 2001, PFE.

[28]  Mathieu Acher,et al.  On extracting feature models from product descriptions , 2012, VaMoS.

[29]  Dragan Gasevic,et al.  Decision support for the software product line domain engineering lifecycle , 2011, Automated Software Engineering.

[30]  Iris Reinhartz-Berger,et al.  Improving the management of product lines by performing domain knowledge extraction and cross product line analysis , 2015, Inf. Softw. Technol..

[31]  Yinglin Wang Automatic semantic analysis of software requirements through machine learning and ontology approach , 2016 .

[32]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions , 2010, International Coaching Psychology Review.

[33]  Mathieu Acher,et al.  Feature model extraction from large collections of informal product descriptions , 2013, ESEC/FSE 2013.

[34]  Yinglin Wang Semantic information extraction for software requirements using semantic role labeling , 2015, 2015 IEEE International Conference on Progress in Informatics and Computing (PIC).

[35]  Felice Dell'Orletta,et al.  CMT and FDE: tools to bridge the gap between natural language documents and feature diagrams , 2015, SPLC.

[36]  George Valença,et al.  Accepted Manuscript Requirements Engineering for Software Product Lines: a Systematic Literature Review Accepted Manuscript Requirements Engineering for Software Product Lines: a Systematic Literature Review Accepted Manuscript , 2022 .

[37]  Renata Pontin de Mattos Fortes,et al.  A systematic review of domain analysis tools , 2010, Inf. Softw. Technol..

[38]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..