Recommending Frequently Encountered Bugs

Developers introduce bugs during software development which reduce software reliability. Many of these bugs are commonly occurring and have been experienced by many other developers. Informing developers, especially novice ones, about commonly occurring bugs in a domain of interest (e.g., Java), can help developers comprehend program and avoid similar bugs in the future. Unfortunately, information about commonly occurring bugs are not readily available. To address this need, we propose a novel approach named RFEB which recommends frequently encountered bugs (FEBugs) that may affect many other developers. RFEB analyzes Stack Overflow which is the largest software engineering-specific Q&A communities. Among the plenty of questions posted in Stack Overflow, many of them provide the descriptions and solutions of different kinds of bugs. Unfortunately, the search engine that comes with Stack Overflow is not able to identify FEBugs well. To address the limitation of the search engine of Stack Overflow, we propose RFEB which is an integrated and iterative approach that considers both relevance and popularity of Stack Overflow questions to identify FEBugs. To evalu- ate the performance of RFEB, we perform experiments on a dataset from Stack Overflow which contains more than ten million posts. We compared our model with Stack Overflow's search engine on 10 domains, and the experiment results show that RFEB achieves the average NDCG10score of 0.96, which improves Stack Overflow's search engine by 20%.

[1]  Xinli Yang,et al.  High-Impact Bug Report Identification with Imbalanced Learning Strategies , 2017, Journal of Computer Science and Technology.

[2]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[3]  Paula Kotzé,et al.  Proceedings of the 2002 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology , 2002 .

[4]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[5]  Zhenchang Xing,et al.  Domain-specific cross-language relevant question retrieval , 2016, MSR.

[6]  Chanchal Kumar Roy,et al.  Towards a context-aware IDE-based meta search engine for recommendation about programming errors and exceptions , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[7]  Avinash C. Kak,et al.  Assisting code search with automatic Query Reformulation for bug localization , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[8]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[9]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  David Lo,et al.  It Takes Two to Tango: Deleted Stack Overflow Question Prediction with Text and Meta Features , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[11]  Niranjan Balasubramanian,et al.  Exploring reductions for long web queries , 2010, SIGIR.

[12]  Martin Pinzger,et al.  Towards a weighted voting system for Q&A sites , 2013 .

[13]  Jinqiu Yang,et al.  Inferring semantically related words from software context , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[14]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[15]  David Lo,et al.  Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[16]  Ricardo Baeza-Yates,et al.  Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[17]  Hewijin Christine Jiau,et al.  A FAQ Finding Process in Open Source Project Forums , 2010, 2010 Fifth International Conference on Software Engineering Advances.

[18]  Zhenchang Xing,et al.  AnswerBot: Automated generation of answer summary to developers' technical questions , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  David Brumley,et al.  An empirical study of cryptographic misuse in android applications , 2013, CCS.

[20]  Emily Hill,et al.  Using natural language program analysis to locate and understand action-oriented concerns , 2007, AOSD.

[21]  Shing-Chi Cheung,et al.  Understanding a developer social network and its evolution , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[22]  Ashish Sureka,et al.  Chaff from the wheat: characterization and modeling of deleted questions on stack overflow , 2014, WWW.

[23]  Christoph Treude,et al.  The impact of social media on software engineering practices and tools , 2010, FoSER '10.

[24]  David Lo,et al.  EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites , 2014, ICSME.

[25]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[26]  Jonathan Sillito,et al.  Searching and skimming: An exploratory study , 2009, 2009 IEEE International Conference on Software Maintenance.

[27]  Radu Vanciu,et al.  Partial Domain Comprehension in Software Evolution and Maintenance , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[28]  David Lo,et al.  Automatic classification of software related microblogs , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[29]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[30]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[31]  Martin Pinzger,et al.  Grouping Android Tag Synonyms on Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[32]  Alberto Bacchelli,et al.  Quality Questions Need Quality Code: Classifying Code Fragments on Stack Overflow , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[33]  X. Allan Lu,et al.  Query Expansion/Reduction and its Impact on Retrieval Effectiveness , 1994, TREC.

[34]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[35]  Ferdian Thung,et al.  Automatic Defect Categorization , 2012, 2012 19th Working Conference on Reverse Engineering.

[36]  Mira Mezini,et al.  Semi-automatically extracting FAQs to improve accessibility of software development knowledge , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[37]  Chanchal Kumar Roy,et al.  Mining Duplicate Questions of Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[38]  Zhenchang Xing,et al.  XSearch: a domain-specific cross-language relevant question retrieval tool , 2017, ESEC/SIGSOFT FSE.

[39]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[40]  David Lo,et al.  Automatic Defect Categorization Based on Fault Triggering Conditions , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[41]  Juan Luis Castro,et al.  A cloud of FAQ: A highly-precise FAQ retrieval system for the Web 2.0 , 2013, Knowl. Based Syst..

[42]  Dick Ng'ambi Pre-empting user questions through anticipation: data mining FAQ lists , 2002 .

[43]  Andrew Begel,et al.  Social media for software engineering , 2010, FoSER '10.

[44]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[45]  David Lo,et al.  Active code search: incorporating user feedback to improve code search relevance , 2014, ASE.

[46]  Ashish Sureka,et al.  Fit or unfit: analysis and prediction of 'closed questions' on stack overflow , 2013, COSN '13.

[47]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[48]  Dan Yang,et al.  A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis , 2016, Inf. Softw. Technol..

[49]  Daniele Romano,et al.  Towards a Weighted Voting System for Q&A Sites , 2013, 2013 IEEE International Conference on Software Maintenance.

[50]  Robert H. Deng,et al.  CDRep: Automatic Repair of Cryptographic Misuses in Android Applications , 2016, AsiaCCS.

[51]  David Lo,et al.  Multi-Factor Duplicate Question Detection in Stack Overflow , 2015, Journal of Computer Science and Technology.

[52]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[53]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[54]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).