SQLIFIX: Learning Based Approach to Fix SQL Injection Vulnerabilities in Source Code

SQL Injection attack is one of the oldest yet effective attacks for web applications. Even in 2020, applications are vulnerable to SQL Injection attacks. The developers are sup-posed to take precautions such as parameterizing SQL queries, escaping special characters, etc. However, developers, especially inexperienced ones, often fail to comply with such guidelines. There are quite a few SQL Injection detection tools to expose any unattended SQL Injection vulnerability in source code. However, to the best of our knowledge, very few works have been done to suggest a fix of these vulnerabilities in the source code. We have developed a learning-based approach that prepares abstraction of SQL Injection vulnerable codes from training dataset and clusters them using hierarchical clustering. The test samples are matched with a cluster of similar samples and a fix suggestion is generated. We have developed a manually validated training and test dataset from real-world projects of Java and PHP to evaluate our language-agnostic approach. The results establish the superiority of our technique over comparable techniques. The code and dataset are released publicly to encourage reproduction.

[1]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[2]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[3]  San-Tsai Sun,et al.  Classification of SQL Injection Attacks , 2007 .

[4]  Claire Le Goues,et al.  A genetic programming approach to automated software repair , 2009, GECCO.

[5]  Gabriele Bavota,et al.  On Learning Meaningful Code Changes Via Neural Machine Translation , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[6]  Johannes Bader,et al.  Getafix: learning to fix bugs automatically , 2019, Proc. ACM Program. Lang..

[7]  Alessandro Orso,et al.  AMNESIA: analysis and monitoring for NEutralizing SQL-injection attacks , 2005, ASE.

[8]  Yishai A. Feldman,et al.  Code-motion for API migration: fixing SQL injection vulnerabilities in Java , 2011, WRT '11.

[9]  Alessandro Orso,et al.  A Classification of SQL-Injection Attacks and Countermeasures , 2006 .

[10]  Mohammad Zulkernine,et al.  Information-Theoretic Detection of SQL Injection Attacks , 2012, 2012 IEEE 14th International Symposium on High-Assurance Systems Engineering.

[11]  Bruce W. Weide,et al.  Using parse tree validation to prevent SQL injection attacks , 2005, SEM '05.

[12]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[13]  Zhendong Su,et al.  The essence of command injection attacks in web applications , 2006, POPL '06.

[14]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[15]  Mariano Ceccato,et al.  SOFIA: An automated security oracle for black-box testing of SQL-injection vulnerabilities , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Zhi-jian Wang,et al.  Notice of RetractionA Static Analysis Tool for Detecting Web Application Injection Vulnerabilities for ASP Program , 2010, 2010 2nd International Conference on E-business and Information System Security.

[17]  Denys Poshyvanyk,et al.  SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair , 2018, IEEE Transactions on Software Engineering.

[18]  Alessandro Orso,et al.  Using positive tainting and syntax-aware evaluation to counter SQL injection attacks , 2006, SIGSOFT '06/FSE-14.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Daniela Micucci,et al.  Automatic Software Repair: A Survey , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  D. T. Lee,et al.  Securing web application code by static analysis and runtime protection , 2004, WWW '04.

[23]  Chen Liu,et al.  R2Fix: Automatically Generating Bug Fixes from Bug Reports , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[24]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[25]  Laurie A. Williams,et al.  On automated prepared statement generation to remove SQL injection vulnerabilities , 2009, Inf. Softw. Technol..

[26]  Baishakhi Ray,et al.  CODIT: Code Editing With Tree-Based Neural Models , 2020, IEEE Transactions on Software Engineering.

[27]  Md. Saiful Islam,et al.  Automatic Detection of NoSQL Injection Using Supervised Learning , 2019, 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC).

[28]  Claire Le Goues,et al.  Automatic program repair with evolutionary computation , 2010, Commun. ACM.

[29]  A. Prasad Sistla,et al.  TAPS: automatically preparing safe SQL queries , 2010, CCS '10.

[30]  Thibaud Lutellier,et al.  ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair , 2019, ArXiv.

[31]  V. N. Venkatakrishnan,et al.  CANDID: Dynamic candidate evaluations for automatic prevention of SQL injection attacks , 2010, TSEC.

[32]  Frank Tip,et al.  Automated repair of HTML generation errors in PHP applications using string constraint solving , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[34]  Jordi Levy,et al.  Anti-unification for Unranked Terms and Hedges , 2013, Journal of Automated Reasoning.

[35]  Lwin Khin Shar,et al.  Predicting common web application vulnerabilities from input validation and sanitization code patterns , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[36]  Mark Sherriff,et al.  Automated Fix Generator for SQL Injection Attacks , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).