Software Bug Classification using Suffix Tree Clustering (STC) Algorithm

Suffix Tree Clustering (STC) is one of the popular text clustering algorithms. STC has number of applications and the most popular is web document clustering. Software bug data contains number of attributes like bug-id, summary (title), description, comments, status, version etc. Most of the important attributes holds text data. Since the software bug repositories are consist of most the data in the form of text, STC can be applied to create the clusters of software bug record. In this paper STC algorithm is used for software bug classification. First clusters are created from the bug repositories and then labels are assigned to the each cluster, which indicates the classes of the clusters. STC implementation is available as the part of Carrot2 framework. The designed technique is evaluated using the common clustering parameters.

[1]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[2]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[3]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[4]  William Pugh,et al.  Learning from defect removals , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[5]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[6]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[7]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[8]  Wilhelm Hasselbring,et al.  Research issues in software fault categorization , 2007, SOEN.

[9]  Naresh Kumar Nagwani,et al.  An Open Source Framework for Data Pre-processing of Online Software Bug Repositories , 2009 .

[10]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[11]  Pradeep Singh,et al.  Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs , 2009, ICAC3 '09.

[12]  Laura Maruster,et al.  Encyclopedia of data warehousing and mining , 2008 .

[13]  Jaime Spacco,et al.  SZZ revisited: verifying when changes induce fixes , 2008, DEFECTS '08.

[14]  Dawid Weiss,et al.  Comprehensible and Accurate Cluster Labels in Text Clustering , 2007, RIAO.

[15]  Shrish Verma,et al.  Predictive data mining model for software bug estimation using average weighted similarity , 2010, 2010 IEEE 2nd International Advance Computing Conference (IACC).

[16]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[17]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[18]  Naresh Kumar Nagwani,et al.  Bug Mining Model Based on Event-Component Similarity to Discover Similar and Duplicate GUI Bugs , 2009, 2009 IEEE International Advance Computing Conference.

[19]  Lerina Aversano,et al.  Learning from bug-introducing changes to prevent fault prone code , 2007, IWPSE '07.

[20]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[21]  Harald C. Gall,et al.  Discovering Patterns of Change Types , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.