Using Stack Overflow content to assist in code review

An essential goal for programmers is to minimize the cost of identifying and correcting defects in source code. Code review is commonly used for identifying programming defects. However, manual code review has some shortcomings: (1) it is time‐consuming and (2) outcomes are subjective and depend on the skills of reviewers. An automated approach for assisting in code reviews is thus highly desirable. We present a tool for assisting in code review and results from our experiments evaluating the tool in different scenarios. The tool leveraged content available from professional programmer support forums (eg, StackOverflow.com) to determine potential defectiveness of a given piece of source code. The defectiveness is expressed on the scale of {Likely defective, neutral, unlikely to be defective}. The basic idea employed in the tool is (1) to identify a set P of discussion posts on Stack Overflow such that each p∈P contains source code fragment(s), which sufficiently resemble the input code C being reviewed, and (2) to determine the likelihood of C being defective by considering all p∈P. A novel aspect of our approach is to use document fingerprinting for comparing two pieces of source code. Our choice of document fingerprinting technique is inspired by source code plagiarism detection tools where it has proven to be very successful. In the experiments that we performed to verify the effectiveness of our approach, source code samples from more than 300 GitHub open‐source repositories were taken as input. An F1 score of 0.94 has been achieved in identifying correct/relevant results.

[1]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[2]  Tao Xie,et al.  Parseweb: a programmer assistant for reusing open source code on the web , 2007, ASE.

[3]  Jens Grabowski,et al.  Intuition vs. Truth: Evaluation of Common Myths about StackOverflow Posts , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[4]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[5]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[6]  Michele Lanza,et al.  Seahawk: Stack Overflow in the IDE , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[8]  Gary McGraw,et al.  Automated Code Review Tools for Security , 2008, Computer.

[9]  David Wright,et al.  Some Conservative Stopping Rules for the Operational Testing of Safety-Critical Software , 1997, IEEE Trans. Software Eng..

[10]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[11]  Shane McIntosh,et al.  An empirical study of design discussions in code review , 2018, ESEM.

[12]  Adam Kolawa,et al.  Automated Defect Prevention , 2007 .

[13]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[14]  David Lo,et al.  An empirical study on developer interactions in StackOverflow , 2013, SAC '13.

[15]  Adam Kolawa,et al.  Automated Defect Prevention , 2007 .

[16]  Steve McConnell,et al.  Code complete - a practical handbook of software construction, 2nd Edition , 1993 .

[17]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[18]  Tommaso Di Noia,et al.  Ontology-Driven Pattern Selection and Matching in Software Design , 2014, ECSA.

[19]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[20]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[21]  Wendy D. Cornell,et al.  Application of an automated natural language processing (NLP) workflow to enable federated search of external biomedical content in drug discovery and development. , 2016, Drug discovery today.

[22]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[23]  Yair Movshovitz-Attias,et al.  Analysis of the reputation system and user contributions on a question answering website: StackOverflow , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[24]  Chanchal Kumar Roy,et al.  Towards a context-aware IDE-based meta search engine for recommendation about programming errors and exceptions , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).