Stack Overflow: A code laundering platform?

Developers use Question and Answer (Q&A) websites to exchange knowledge and expertise. Stack Overflow is a popular Q&A website where developers discuss coding problems and share code examples. Although all Stack Overflow posts are free to access, code examples on Stack Overflow are governed by the Creative Commons Attribute-ShareAlike 3.0 Unported license that developers should obey when reusing code from Stack Overflow or posting code to Stack Overflow. In this paper, we conduct a case study with 399 Android apps, to investigate whether developers respect license terms when reusing code from Stack Overflow posts (and the other way around). We found 232 code snippets in 62 Android apps from our dataset that were potentially reused from Stack Overflow, and 1,226 Stack Overflow posts containing code examples that are clones of code released in 68 Android apps, suggesting that developers may have copied the code of these apps to answer Stack Overflow questions. We investigated the licenses of these pieces of code and observed 1,279 cases of potential license violations (related to code posting to Stack overflow or code reuse from Stack overflow). This paper aims to raise the awareness of the software engineering community about potential unethical code reuse activities taking place on Q&A websites like Stack Overflow.

[1]  Stéphane Ducasse,et al.  Insights into system-wide code duplication , 2004, 11th Working Conference on Reverse Engineering.

[2]  Chanchal Kumar Roy,et al.  Evaluating Modern Clone Detection Tools , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[3]  Joachim Henkel,et al.  License risks from ad hoc reuse of code from the internet , 2011, Commun. ACM.

[4]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[5]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.

[6]  Gabriele Bavota,et al.  Mining StackOverflow to turn the IDE into a self-confident programming prompter , 2014, MSR 2014.

[7]  Gabriele Bavota,et al.  When and why developers adopt and change software licenses , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[8]  Chanchal Kumar Roy,et al.  The NiCad Clone Detector , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[9]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[10]  Katsuro Inoue,et al.  A sentence-matching method for automatic license identification of source code files , 2010, ASE.

[11]  Georgia Koutrika,et al.  Questioning Yahoo! Answers , 2007 .

[12]  Joachim Henkel,et al.  Understanding the Drivers of Unethical Programming Behavior: The Inappropriate Reuse of Internet-Accessible Code , 2014, J. Manag. Inf. Syst..

[13]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[14]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[15]  R. Yin Case Study Research: Design and Methods , 1984 .

[16]  Gabriele Bavota,et al.  License Usage and Changes: A Large-Scale Study of Java Projects on GitHub , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[17]  Foutse Khomh,et al.  On the Detection of Licenses Violations in the Android Ecosystem , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).