Concept-Based Classification of Software Defect Reports

Automatic identification of the defect type from the textual description of a software defect can significantly speed-up as well as improve the software defect management life-cycle. This has been recognized in the research community and multiple solutions based on supervised learning approach have been proposed in the recent literature. However, these approaches need significant amount of labeled training data for use in real-life projects. In this paper, we propose to use Explicit Semantic Analysis (ESA) to carry out concept-based classification of software defect reports. We compute the "semantic similarity" between the defect type labels and the defect report in a concept space spanned by Wikipedia articles and then, assign the defect type which has the highest similarity with the defect report. This approach helps us to circumvent the problem of dependence on labeled training data. Experimental results show that using concept-based classification is a promising approach for software defect classification to avoid the expensive process of creating labeled training data and yet get accuracy comparable to the traditional supervised learning approaches. To the best of our knowledge, this is the first use of Wikipedia and ESA for software defect classification problem.

[1]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[2]  Ferdian Thung,et al.  Automatic Defect Categorization , 2012, 2012 19th Working Conference on Reverse Engineering.

[3]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[4]  David Lo,et al.  Active Semi-supervised Defect Categorization , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[5]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[6]  Kenneth Magel,et al.  Efficient Bug Triaging Using Text Mining , 2013, J. Softw..

[7]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[8]  LiGuo Huang,et al.  AutoODC: Automated generation of orthogonal defect classifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[9]  Ram Chillarege,et al.  Orthogonal defect classification , 1996 .

[10]  Stefan Wagner,et al.  Defect classification and defect types revisited , 2008, DEFECTS '08.