Stack Overflow in Github: Any Snippets There?

When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.

[1]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[2]  Ali Mesbah,et al.  Mining questions asked by web developers , 2014, MSR 2014.

[3]  Siti Rochimah,et al.  Source code retrieval on StackOverflow using LDA , 2015, 2015 3rd International Conference on Information and Communication Technology (ICoICT).

[4]  Zhenchang Xing,et al.  Towards Correlating Search on Google and Asking on Stack Overflow , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[5]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  David Lo,et al.  EnTagRec++: An enhanced tag recommendation system for software information sites , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[7]  Mario Gleirscher,et al.  On the Extent and Nature of Software Reuse in Open Source Java Projects , 2011, ICSR.

[8]  Gabriele Bavota,et al.  Prompter: A Self-Confident Recommender System , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[9]  Georgios Gousios,et al.  GHTorrent: Github's data from a firehose , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[10]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[11]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[12]  Michele Lanza,et al.  Seahawk: Stack Overflow in the IDE , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13]  Foutse Khomh,et al.  Stack Overflow: A code laundering platform? , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[14]  Premkumar T. Devanbu,et al.  How social Q&A sites are changing knowledge sharing in open source software communities , 2014, CSCW.

[15]  Zhendong Su,et al.  A study of the uniqueness of source code , 2010, FSE '10.

[16]  Frank Maurer,et al.  What makes a good code example?: A study of programming Q&A in StackOverflow , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[17]  Chanchal Kumar Roy,et al.  Near-miss function clones in open source software : an empirical study , 2009 .

[18]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[20]  Cristina V. Lopes,et al.  From Query to Usable Code: An Analysis of Stack Overflow Code Snippets , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).