Generating Intelligent Summary Terms for Improving Knowledge Discovery in Software Bug Repositories

Software bug records are stored and managed using bug tracking tools. A software bug is characterized by a number of attributes like bug id, opened date, closed date, reported by, assigned to, summary (title), description and set of comments. Summary and description are the two important attributes of a bug. Description gives the detailed information about a bug, whereas summary (title) of a bug gives a quick glance and short information about a bug. The objective of this study is to discover the relationship between description and summary attributes of a bug and to find whether summary of a bug is really the compact and intelligent information of description of a bug. This finding helps in providing a new direction for faster knowledge discovery in a bug repository. Another objective of the work is to demonstrate that intelligent summary of a bug can be generated from description of bug using topic modeling techniques. In this work, topic modeling techniques are used to generate meaningful terms for framing the bug summary of software bugs which can be utilized for faster knowledge discovery. Topic modeling techniques can be utilized efficiently for generating intelligent summary from description of a software bugs and then the knowledge discovery can be performed using the intelligent summary only since it will reduce the volume of data for knowledge discovery. To demonstrate the presented approach, experiments are performed on three popular bug repositories namely, Android, Mozilla and MySql. Comparative analysis is carried using various performance parameters and in order to analyze the impact of present work, two knowledge discovery tasks namely, bug classification and duplicate bug identification are presented in this study.

[1]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[2]  Lerina Aversano,et al.  Learning from bug-introducing changes to prevent fault prone code , 2007, IWPSE '07.

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  Siau-Cheng Khoo,et al.  A discriminative model approach for accurate duplicate bug report retrieval , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[5]  Shrish Verma,et al.  A Comparative Study of Bug Classification Algorithms , 2014, Int. J. Softw. Eng. Knowl. Eng..

[6]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[7]  Gang Liu,et al.  Short text similarity based on probabilistic topics , 2009, Knowledge and Information Systems.

[8]  Ladan Tahvildari,et al.  A Comparative Study of the Performance of IR Models on Duplicate Bug Detection , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[9]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[10]  Bin Li,et al.  What Information in Software Historical Repositories Do We Need to Support Software Maintenance Tasks? An Approach Based on Topic Model , 2015, Computer and Information Science.

[11]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[12]  Gail C. Murphy,et al.  Summarizing software artifacts: a case study of bug reports , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13]  Ahmed E. Hassan,et al.  Studying software evolution using topic models , 2014, Sci. Comput. Program..

[14]  Gail C. Murphy,et al.  Determining Implementation Expertise from Bug Reports , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[16]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[17]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[18]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[19]  Krzysztof Czarnecki,et al.  Modelling the 'Hurried' bug report reading process to summarize bug reports , 2012, ICSM.