ATLaS: A Framework for Traceability Links Recovery Combining Information Retrieval and Semi-Supervised Techniques

Current Model-Based Systems Engineering (MBSE) practices to design and implement complex systems require modeling and analysis based on many representations: structure, dynamics, safety, security, etc. This induces a large volume of overlapping heterogeneous artefacts which are subject to frequent changes during the project life cycle. In order to verify and validate systems requirements and ensure that models meet user's needs, MBSE techniques shall rely on consistent traceability management. In this paper, we investigate the benefits of Information Retrieval (IR) techniques and the latest advances in Natural Language Processing (NLP) approaches to suggest stakeholders with candidate semantic links generated from the processing of structured and unstructured contents. We illustrate our approach called ATLaS (Aggregation Trace Links Support) through an application on the design and analysis of a mobility service gathering several industrial partners. We provide an empirical evaluation regarding its limitations as part of an industrial MBSE process. Most importantly, we highlight how our method drastically reduces the false positive links generated compared to current IR techniques. The results obtained suggest a good synergy between the presented approach and MBSE techniques.

[1]  Jane Huffman Hayes,et al.  Towards overcoming human analyst fallibility in the requirements tracing process: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[2]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[3]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[4]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[5]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[6]  John C. Grundy,et al.  Improving automated documentation to code traceability by combining retrieval techniques , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[7]  Sonia Haiduc,et al.  A Machine Learning Approach for Determining the Validity of Traceability Links , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Jill E. Hobbs,et al.  Information asymmetry and the role of traceability systems , 2004 .

[11]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[12]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[13]  Nan Niu,et al.  Enhancing candidate link generation for requirements tracing: The cluster hypothesis revisited , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Jane Cleland-Huang,et al.  Clustering support for automated tracing , 2007, ASE '07.

[16]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[17]  M. Seeger Learning with labeled and unlabeled dataMatthias , 2001 .

[18]  Laurent Wouters,et al.  Towards Semantic-Aware Collaborations in Systems Engineering , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[19]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[20]  Marie-Pierre Gervais,et al.  Semi-Supervised Approach for Recovering Traceability Links in Complex Systems , 2018, 2018 23rd International Conference on Engineering of Complex Computer Systems (ICECCS).

[21]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[22]  David Lo,et al.  Should I follow this fault localization tool’s output? , 2014, Empirical Software Engineering.

[23]  Richard F. Paige,et al.  Towards a Multi-Domain Model-Driven Traceability Approach , 2013, MPM@MoDELS.

[24]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[25]  Wenbin Li,et al.  A study of methods for textual satisfaction assessment , 2013, Empirical Software Engineering.

[26]  Giuliano Antoniol,et al.  Grand challenges, benchmarks, and TraceLab: developing infrastructure for the software traceability research community , 2011, TEFSE '11.

[27]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[28]  Jane Huffman Hayes,et al.  Application of reinforcement learning to requirements engineering: requirements tracing , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[29]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[30]  S. Palaniswami,et al.  Similar Words Identification Using Naive and TF-IDF Method , 2014 .

[31]  Qing Sun,et al.  An Improved Approach to Traceability Recovery Based on Word Embeddings , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[32]  Amit P. Sheth,et al.  Management of interdependent data: specifying dependency and consistency requirements , 1990, [1990] Proceedings. Workshop on the Management of Replicated Data.

[33]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[34]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[35]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[36]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[37]  Benoit Baudry,et al.  Toward multilevel textual requirements traceability using model-driven engineering and information retrieval , 2012, 2012 Second IEEE International Workshop on Model-Driven Requirements Engineering (MoDRE).

[38]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[39]  Jane Cleland-Huang,et al.  Utilizing supporting evidence to improve dynamic requirements traceability , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[40]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[41]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[42]  Jan-Philipp Steghöfer,et al.  Capra: A Configurable and Extendable Traceability Management Tool , 2016, 2016 IEEE 24th International Requirements Engineering Conference (RE).

[43]  Chao Liu,et al.  Recovering Relationships between Documentation and Source Code based on the Characteristics of Software Engineering , 2009, Electron. Notes Theor. Comput. Sci..