Semantically Enhanced Software Traceability Using Deep Learning Techniques

In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[3]  Jane Cleland-Huang,et al.  Learning effective query transformations for enhanced requirements trace retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[5]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[7]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[8]  Thanwadee Sunetnanta,et al.  Ontology-based multiperspective requirements traceability framework , 2010, Knowledge and Information Systems.

[9]  Patrick Mäder,et al.  Traceability Gap Analysis for Assessing the Conformance of Software Traceability to Relevant Guidelines , 2015, Software Engineering & Management.

[10]  Jane Cleland-Huang,et al.  Improving automated requirements trace retrieval: a study of term-based enhancement methods , 2010, Empirical Software Engineering.

[11]  Giuliano Antoniol,et al.  Traceability Fundamentals , 2012, Software and Systems Traceability.

[12]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[13]  Jane Cleland-Huang,et al.  Ontology-based trace retrieval , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[14]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[15]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yi Zhang,et al.  Strategic Traceability for Safety-Critical Projects , 2013, IEEE Software.

[17]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[19]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[20]  Genny Tortora,et al.  Enhancing an artefact management system with traceability recovery features , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[21]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[22]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Grant Williams,et al.  Detecting, classifying, and tracing non-functional software requirements , 2016, Requirements Engineering.

[26]  Jane Cleland-Huang,et al.  Cold-Start Software Analytics , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[27]  Jane Cleland-Huang,et al.  Towards mining replacement queries for hard-to-retrieve traces , 2010, ASE.

[28]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[29]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[30]  Patrick Mäder,et al.  Achieving lightweight trustworthy traceability , 2014, SIGSOFT FSE.

[31]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Nan Niu,et al.  A semantic relatedness approach for traceability link recovery , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[33]  Nan Niu,et al.  Enhancing candidate link generation for requirements tracing: The cluster hypothesis revisited , 2012, 2012 20th IEEE International Requirements Engineering Conference (RE).

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[36]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[37]  Jane Huffman Hayes,et al.  Application of swarm techniques to requirements tracing , 2011, Requirements Engineering.

[38]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[39]  Jane Cleland-Huang,et al.  Foundations for an expert system in domain-specific traceability , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[40]  John C. Knight,et al.  Safety critical systems: challenges and directions , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[41]  Mehrdad Sabetzadeh,et al.  Automatic Checking of Conformance to Requirement Boilerplates via Text Chunking: An Industrial Case Study , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[42]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[43]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[44]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[45]  Peter Jackson,et al.  Introduction to expert systems , 1986 .

[46]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[47]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[48]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[49]  Jane Cleland-Huang,et al.  Towards an intelligent domain-specific traceability solution , 2014, ASE.

[50]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[51]  Andrea Zisman,et al.  Rule-based generation of requirements traceability relations , 2004, J. Syst. Softw..

[52]  Jane Huffman Hayes,et al.  Technique Integration for Requirements Assessment , 2007, 15th IEEE International Requirements Engineering Conference (RE 2007).

[53]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[54]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[55]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[56]  Jane Cleland-Huang,et al.  A comparative evaluation of two user feedback techniques for requirements trace retrieval , 2012, SAC '12.

[57]  Shinpei Hayashi,et al.  Sentence-to-Code Traceability Recovery with Domain Ontologies , 2010, 2010 Asia Pacific Software Engineering Conference.

[58]  Mehrdad Sabetzadeh,et al.  Change impact analysis for Natural Language requirements: An NLP approach , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[59]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[60]  Jane Cleland-Huang,et al.  Towards more intelligent trace retrieval algorithms , 2014, RAISE 2014.

[61]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[62]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[63]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[64]  David D. Cox,et al.  A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..

[65]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .