A Qualitative Investigation of Insecure Code Propagation from Online Forums

Research demonstrates that code snippets listed on programming-oriented online forums (e.g., Stack Overflow) – including snippets containing security mistakes – make their way into production code. Prior work also shows that software developers who reference Stack Overflow in their development cycle produce less secure code. While there are many plausible explanations for why developers propagate insecure code in this manner, there is little or no empirical evidence. To address this question, we identify Stack Overflow code snippets that contain security errors and find clones of these snippets in open source GitHub repositories. We then survey (n=133) and interview (n=15) the authors of these GitHub repositories to explore how and why these errors were introduced. We find that some developers (perhaps mistakenly) trust their security skills to validate the code they import, but the majority admit they would need to learn more about security before they could properly perform such validation. Further, although some prioritize functionality over security, others believe that ensuring security is not, or should not be, their responsibility. Our results have implications for attempts to ameliorate the propagation of this insecure code.

[1]  Reid Holmes,et al.  Live API documentation , 2014, ICSE.

[2]  Chanchal Kumar Roy,et al.  Mining Duplicate Questions of Stack Overflow , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[3]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[4]  Quan Z. Sheng,et al.  Detecting Duplicate Posts in Programming QA Communities via Latent Semantics and Association Rules , 2017, WWW.

[5]  Stephan Diehl,et al.  Attribution Required: Stack Overflow Code Snippets in GitHub Projects , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[6]  Michael Backes,et al.  You Get Where You're Looking for: The Impact of Information Sources on Code Security , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[7]  Elissa M. Redmiles,et al.  How I Learned to be Secure: a Census-Representative Survey of Security Advice Sources and Behavior , 2016, CCS.

[8]  Christoph Treude,et al.  Augmenting API Documentation with Insights from Stack Overflow , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[9]  Michael Backes,et al.  Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  Frank Maurer,et al.  What makes a good code example?: A study of programming Q&A in StackOverflow , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[11]  Cristina V. Lopes,et al.  Stack Overflow in Github: Any Snippets There? , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[12]  Foutse Khomh,et al.  Stack Overflow: A code laundering platform? , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[13]  David Lo,et al.  Multi-Factor Duplicate Question Detection in Stack Overflow , 2015, Journal of Computer Science and Technology.

[14]  Simson L. Garfinkel,et al.  Comparing the Usability of Cryptographic APIs , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[16]  Haralambos Mouratidis,et al.  When security meets software engineering: a case of modelling secure information systems , 2005, Inf. Syst..

[17]  Alessandro Bozzon,et al.  Asking the right question in collaborative q&a systems , 2014, HT.

[18]  Mira Mezini,et al.  "Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[19]  Lorrie Faith Cranor,et al.  Improving App Privacy: Nudging App Developers to Protect User Privacy , 2014, IEEE Security & Privacy.

[20]  Xi Wang,et al.  Why does cryptographic software fail?: a case study and open problems , 2014, APSys.

[21]  Cristina V. Lopes,et al.  From Query to Usable Code: An Analysis of Stack Overflow Code Snippets , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[22]  Haoxiang Zhang,et al.  An Empirical Study of Obsolete Answers on Stack Overflow , 2019, IEEE Transactions on Software Engineering.

[23]  Katsuro Inoue,et al.  How do developers utilize source code from stack overflow? , 2018, Empirical Software Engineering.

[24]  David Brumley,et al.  An empirical study of cryptographic misuse in android applications , 2013, CCS.

[25]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[26]  Matthew Smith,et al.  Why Do Developers Get Password Storage Wrong?: A Qualitative Usability Study , 2017, CCS.

[27]  David Lo,et al.  An empirical study on developer interactions in StackOverflow , 2013, SAC '13.