Data-Driven Vulnerability Detection and Repair in Java Code

Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cyber security training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerabilities, this paper presents Seader—our new approach that detects and repairs security API misuses. Given an exemplar insecure code snippet and its secure counterpart, Seader compares the snippets and conducts data dependence analysis to infer the security API misuse templates and corresponding fixing operations. Based on the inferred information, given a program, Seader performs interprocedural static analysis to search for any security API misuse and to propose customized fixing suggestions for those vulnerabilities. To evaluate Seader, we applied it to 25 <insecure, secure> code pairs, and Seader successfully inferred 18 unique API misuse templates and related fixes. With these vulnerability repair patterns, we further applied Seader to 10 open-source projects that contain in total 32 known vulnerabilities. Our experiment shows that Seader detected vulnerabilities with 100% precision, 84% recall, and 91% accuracy. Additionally, we applied Seader to 100 Apache opensource projects and detected 988 vulnerabilities; Seader always customized repair suggestions correctly. Based on Seader’s outputs, we filed 60 pull requests. Up till now, developers of 18 projects have offered positive feedbacks on Seader’s suggestions. Our results indicate that Seader can effectively help developers detect and fix security API misuses. Whereas prior work either detects API misuses or suggests simple fixes, Seader is the first tool to do both for nontrivial vulnerability repairs.

[1]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  David Lo,et al.  Comprehensive evaluation of association measures for fault localization , 2010, 2010 IEEE International Conference on Software Maintenance.

[3]  Na Meng,et al.  Meditor: Inference and Application of API Migration Edits , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[4]  1966 , 2019, Lie on your wounds.

[5]  Felix A. Fischer,et al.  How Reliable is the Crowdsourced Knowledge of Security Implementation? , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[6]  Miryung Kim,et al.  Lase: Locating and applying systematic edits by learning from examples , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7]  Burton S. Kaliski,et al.  PKCS #5: Password-Based Cryptography Specification Version 2.0 , 2000, RFC.

[8]  Robert H. Deng,et al.  VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples , 2017, ESORICS.

[9]  David Brumley,et al.  An empirical study of cryptographic misuse in android applications , 2013, CCS.

[10]  Na Meng,et al.  How Does Execution Information Help with Information-Retrieval Based Bug Localization? , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[11]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[12]  Na Meng,et al.  Automatic Inference of Java-to-Swift Translation Rules for Porting Mobile Applications , 2018, 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[13]  Yuhua Qi,et al.  The strength of random search on automated program repair , 2014, ICSE.

[14]  Martin Monperrus,et al.  Automatic repair of buggy if conditions and missing preconditions with SMT , 2014, CSTVA 2014.

[15]  James Manger,et al.  A Chosen Ciphertext Attack on RSA Optimal Asymmetric Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0 , 2001, CRYPTO.

[16]  Peter Saint-Andre,et al.  Recommendations for Secure Use of Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) , 2015, RFC.

[17]  Mira Mezini,et al.  CrySL: An Extensible Approach to Validating the Correct Usage of Cryptographic APIs , 2018, IEEE Transactions on Software Engineering.

[18]  Sumit Gulwani,et al.  Learning Syntactic Program Transformations from Examples , 2016, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[19]  Na Meng,et al.  Secure Coding Practices in Java: Challenges and Vulnerabilities , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[20]  Matthew Green,et al.  Developers are Not the Enemy!: The Need for Usable Security APIs , 2016, IEEE Security & Privacy.

[21]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[22]  Bernd Freisleben,et al.  Why eve and mallory love android: an analysis of android SSL (in)security , 2012, CCS.

[23]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[24]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[25]  Michael Backes,et al.  Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[26]  Florian Mendel,et al.  Improving Local Collisions: New Attacks on Reduced SHA-256 , 2013, EUROCRYPT.

[27]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[28]  Felix A. Fischer,et al.  Stack Overflow Considered Helpful! Deep Learning Security Nudges Towards Stronger Cryptography , 2019, USENIX Security Symposium.

[29]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[30]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[31]  Peter Brusilovsky,et al.  JavaParser; A Fine-Grain Concept Indexing Tool for Java Problems , 2013, AIED Workshops.

[32]  Mira Mezini,et al.  "Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[33]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[34]  Rob Miller,et al.  Interactive Simultaneous Editing of Multiple Text Regions , 2001, USENIX ATC, General Track.

[35]  Murat Kantarcioglu,et al.  CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects , 2018, CCS.

[36]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[37]  Vitaly Shmatikov,et al.  The most dangerous code in the world: validating SSL certificates in non-browser software , 2012, CCS.

[38]  V. N. Venkatakrishnan,et al.  Vetting SSL Usage in Applications with SSLINT , 2015, 2015 IEEE Symposium on Security and Privacy.