Code Reuse in Stack Overflow and Popular Open Source Java Projects

Solutions provided in Question and Answer (Q&A) websites such as Stack Overflow are regularly used in Open Source Software (OSS). However, many developers are unaware that both Stack Overflow and OSS are governed by licenses. Hence, developers reusing code from Stack Overflow for their OSS projects may violate licensing agreements if their attributions are not correct. Additionally, if code migrates from one OSS through Stack Overflow to another OSS, then complex licensing issues are likely to exist. Such forms of software reuse also have implications for future software maintenance, particularly where developers have poor understanding of copied code. This paper investigates code reuse between these two platforms (i.e., Stack Overflow and OSS), with the aim of providing insights into this issue. This study mined 151,946 Java code snippets from Stack Overflow, 16,617 Java files from 12 of the top weekly listed projects on SourceForge and GitHub, and 39,616 Java files from the top 20 most popular Java projects on SourceForge. Our analyses were aimed at finding the number of clones (indicating reuse) (a) within Stack Overflow posts, (b) between Stack Overflow and popular Java OSS projects, and (c) between the projects. Outcomes reveal that there was up to 3.3% code reuse within Stack Overflow, while 1.0% of Stack Overflow code was reused in recent popular Java projects and 2.3% in those projects that were more established. Reuse across projects was much higher, accounting for as much as 77.2%. Our outcomes have implication for strategies aimed at introducing strict quality assurance measures to ensure the appropriateness of code reuse, and licensing requirements awareness.

[1]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[2]  Cristina V. Lopes,et al.  Stack Overflow in Github: Any Snippets There? , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[3]  Foutse Khomh,et al.  Stack Overflow: A code laundering platform? , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[4]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[5]  T. R. Gopalakrishnan Nair,et al.  Effective Defect Prevention Approach in Software Process for Achieving Better Quality Levels , 2008, ICSE 2010.

[6]  Michael Backes,et al.  Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[7]  Katsuro Inoue,et al.  Where does this code come from and where does it go? — Integrated code history tracker for open source systems , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[8]  Chanchal Kumar Roy,et al.  The NiCad Clone Detector , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[9]  Martin Höst,et al.  The importance of quality requirements in software platform development-a survey , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[10]  Ahmed E. Hassan,et al.  A Large-Scale Empirical Study on Software Reuse in Mobile Apps , 2014, IEEE Software.

[11]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Joachim Henkel,et al.  Code Reuse in Open Source Software Development: Quantitative Evidence, Drivers, and Impediments , 2010, J. Assoc. Inf. Syst..

[13]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[14]  Mario Gleirscher,et al.  On the Extent and Nature of Software Reuse in Open Source Java Projects , 2011, ICSR.

[15]  William B. Frakes,et al.  Software reuse research: status and future , 2005, IEEE Transactions on Software Engineering.

[16]  P. Santhi Thilagam,et al.  An Empirical Study of License Violations in Open Source Projects , 2012, 2012 35th Annual IEEE Software Engineering Workshop.

[17]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[18]  Foutse Khomh,et al.  On the Detection of Licenses Violations in the Android Ecosystem , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[19]  John C. Knight,et al.  Software quality through domain-;driven certification , 1998, Ann. Softw. Eng..

[20]  Michele Lanza,et al.  Leveraging Crowd Knowledge for Software Comprehension and Development , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[21]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[22]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[23]  Tsutomu Ishida,et al.  Metrics and Models in Software Quality Engineering , 1995 .

[24]  Stephan Diehl,et al.  Attribution Required: Stack Overflow Code Snippets in GitHub Projects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[25]  Frank DeRemer,et al.  Lexical Analysis , 1976, Handbook of Natural Language Processing.

[26]  Sanjay Kumar Dubey,et al.  Survey on Impact of Software Metrics on Software Quality , 2012 .

[27]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[28]  Katsuro Inoue,et al.  An Investigation into the Impact of Software Licenses on Copy-and-paste Reuse among OSS Projects , 2011, 2011 18th Working Conference on Reverse Engineering.

[29]  Paulo Gomes,et al.  Context-based recommendation to support problem solving in software development , 2012, 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE).

[30]  Jonas Eckhardt,et al.  An exploratory study on reuse at google , 2014, SER&IPs 2014.

[31]  Joachim Henkel,et al.  License risks from ad hoc reuse of code from the internet , 2011, Commun. ACM.

[32]  Rainer Koschke,et al.  Software-Clone Rates in Open-Source Programs Written in C or C++ , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[33]  Rabe Abdalkareem,et al.  On code reuse from StackOverflow: An exploratory study on Android apps , 2017, Inf. Softw. Technol..