Adapting Word Embeddings to Traceability Recovery

Maintaining the traceability links of a software is tedious, error-prone task, but an essential requirement. Information retrieval has been approached to help to generate traceability links. Traceability links are usually determined by the similarity between two artifacts. However, methods are put forward mainly based on vector space model, topic model etc. which ignored the word semantic. According to that, this paper adapts the popular word embedding technique to traceability recovery tasks, and handle the out-of-vocabulary words at test time. In the end, a machine learning method is used (learning to rank) to improve our final result. Several contrast experiments are conducted on five public datasets, and the baseline methods are outperformed under the same condition.

[1]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[2]  David W. Binkley,et al.  Learning to Rank Improves IR in SE , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[3]  Ying Zou,et al.  Learning to rank code examples for code search engines , 2017, Empirical Software Engineering.

[4]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[5]  Lars Grunske,et al.  A learning-to-rank based fault localization approach using likely invariants , 2016, ISSTA.

[6]  David Lo,et al.  Learning to rank for bug report assignee recommendation , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Xiaodong Chen,et al.  An Overview of Learning to Rank for Information Retrieval , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[9]  Gethers,et al.  Information Integration for Software Maintenance and Evolution , 2012 .

[10]  Jane Huffman Hayes,et al.  How do we trace requirements: an initial study of analyst behavior in trace validation tasks , 2011, CHASE.

[11]  Maximilian Junker,et al.  Configuring Latent Semantic Indexing for Requirements Tracing , 2015, 2015 IEEE/ACM 2nd International Workshop on Requirements Engineering and Testing.

[12]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[13]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[14]  Andrea Zisman,et al.  Software and Systems Traceability , 2012, Springer London.

[15]  Jane Cleland-Huang,et al.  Semantically Enhanced Software Traceability Using Deep Learning Techniques , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[16]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[17]  Jane Huffman Hayes,et al.  Improving requirements tracing via information retrieval , 2003, Proceedings. 11th IEEE International Requirements Engineering Conference, 2003..

[18]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[19]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[20]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[21]  Gerardo Canfora,et al.  Estimating the number of remaining links in traceability recovery , 2016, Empirical Software Engineering.

[22]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[23]  Giuliano Antoniol,et al.  Grand Challenges of Traceability: The Next Ten Years , 2017, ArXiv.

[24]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[25]  Andrea De Lucia,et al.  Applying a smoothing filter to improve IR-based traceability recovery processes: An empirical investigation , 2013, Inf. Softw. Technol..

[26]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[27]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[28]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[29]  Ruslan Salakhutdinov,et al.  A Comparative Study of Word Embeddings for Reading Comprehension , 2017, ArXiv.