Security and emotion: sentiment analysis of security discussions on GitHub

Application security is becoming increasingly prevalent during software and especially web application development. Consequently, countermeasures are continuously being discussed and built into applications, with the goal of reducing the risk that unauthorized code will be able to access, steal, modify, or delete sensitive data. In this paper we gauged the presence and atmosphere surrounding security-related discussions on GitHub, as mined from discussions around commits and pull requests. First, we found that security related discussions account for approximately 10% of all discussions on GitHub. Second, we found that more negative emotions are expressed in security-related discussions than in other discussions. These findings confirm the importance of properly training developers to address security concerns in their applications as well as the need to test applications thoroughly for security vulnerabilities in order to reduce frustration and improve overall project atmosphere.

[1]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[2]  Christian Wartena,et al.  Topic Detection by Clustering Keywords , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[3]  Measuring the Occurrence of Security-Related Bugs through Software Evolution , 2012, 2012 16th Panhellenic Conference on Informatics.

[4]  Jacob Perkins,et al.  Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 , 2014 .

[5]  Richard L. Kissel Glossary of Key Information Security Terms | NIST , 2013 .

[6]  Steve Lipner,et al.  Security development lifecycle , 2010, Datenschutz und Datensicherheit - DuD.

[7]  Richard M. Schwartz,et al.  An algorithm for unsupervised topic discovery from broadcast news stories , 2002 .

[8]  Georgios Gousios,et al.  Lean GHTorrent: GitHub data on demand , 2014, MSR 2014.

[9]  Alexander Serebrenik,et al.  The Babel of Software Development: Linguistic Diversity in Open Source , 2013, SocInfo.

[10]  Richard Kissel,et al.  Glossary of Key Information Security Terms , 2014 .

[11]  Chris Clifton,et al.  TopCat: data mining for topic identification in a text corpus , 1999, IEEE Transactions on Knowledge and Data Engineering.

[12]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.