Software vulnerabilities and other risks continually emerge as new code is introduced or existing code is modified. The effect of these risks can be disastrous, not only for companies and organizations that provide this software, but also for those that use such software. However, the risks of new/modified code could be potentially mitigated if it were possible to reliably scan all code upon commit. In this paper, we describe a novel approach that leverages prior commit log messages as one means for training a system to automatically flag new commits. Our data-driven approach is designed to complement the hard-wired approaches that most static and dynamic code analysis tools use today. We demonstrate our approach in the context of two major existing projects: Apache Web Server (httpd) and Apache Tomcat, both popular web containers used by hundreds of thousands of organizations.
[1]
Xin Luo,et al.
Awareness Education as the Key to Ransomware Prevention
,
2007,
Inf. Secur. J. A Glob. Perspect..
[2]
Matthew McCullough,et al.
Version Control with Git: Powerful Tools and Techniques for Collaborative Software Development
,
2009
.
[3]
George Forman,et al.
An Extensive Empirical Study of Feature Selection Metrics for Text Classification
,
2003,
J. Mach. Learn. Res..
[4]
Jon R. Lindsay,et al.
Tipping the scales: the attribution problem and the feasibility of deterrence against cyberattack
,
2015,
J. Cybersecur..
[5]
Christopher D. Manning,et al.
Introduction to Information Retrieval
,
2010,
J. Assoc. Inf. Sci. Technol..
[6]
Gary McGraw,et al.
Static Analysis for Security
,
2004,
IEEE Secur. Priv..