Automatic Detection of NoSQL Injection Using Supervised Learning

With the advancement in big data, NoSQL databases are enjoying ever-growing popularity. The increasing use of this technology in large applications also brings security concerns to the fore. Historically, SQL injection has been one of the major security threats over the years. Recent studies reveal that NoSQL databases also have become vulnerable to injections. However, NoSQL security is yet to receive the attention it deserves from the industry or academia. In this work, we develop a tool for detecting NoSQL injections using supervised learning. To the best of our knowledge, our developed training dataset on NoSQL injection is the first of its kind. We manually design important features and apply various supervised learning algorithms. Our tool has achieved 0.93 F2-score as established by 10-fold cross-validation. We also apply our tool to a NoSQL injection generating tool, NoSQLMap and find that our tool outperforms Sqreen, the only available NoSQL injection detection tool, by 36.25% in terms of detection rate. The proposed technique is also shown to be database-agnostic achieving similar performance with injection on MongoDB and CouchDB databases.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[4]  James Newsom,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software, Network and Distributed System Security Symposium Conference Proceedings : 2005 , 2005 .

[5]  Lei Li,et al.  MongoDB NoSQL Injection Analysis and Detection , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[6]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[7]  Lionel C. Briand,et al.  Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning , 2015, IEEE Transactions on Dependable and Secure Computing.

[8]  Zhi Gang Zhang,et al.  An Improved ID3 Decision Tree Algorithm , 2014 .

[9]  Upendra Singh,et al.  Towards analyzing MongoDB NoSQL security and designing injection defense solution , 2017, 2017 2nd International Conference on Communication and Electronics Systems (ICCES).

[10]  Mahesh Chandra Govil,et al.  Predicting Cross-Site Scripting (XSS) security vulnerabilities in web applications , 2015, 2015 12th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[11]  Jules White,et al.  Applying machine learning classifiers to dynamic Android malware detection at scale , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[12]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[13]  Vitaly Shmatikov,et al.  Diglossia: detecting code injection attacks with precision and efficiency , 2013, CCS.

[14]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[17]  K. P. Jevitha,et al.  An Automata Based Approach for the Prevention of NoSQL Injections , 2015, SSCC.

[18]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[19]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[20]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[21]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[22]  Min Chen,et al.  A Parse Tree-Based NoSQL Injection Attacks Detection Mechanism , 2017, J. Inf. Hiding Multim. Signal Process..

[23]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[24]  Aviv Ron,et al.  No SQL, No Injection? Examining NoSQL Security , 2015, ArXiv.

[25]  Ahmed S. Salama,et al.  NoSQL Racket: A Testing Tool for Detecting NoSQL Injection Attacks in Web Applications , 2017 .

[26]  Mohammad Sohel Rahman,et al.  An ensemble learning based approach for impression fraud detection in mobile advertising , 2018, J. Netw. Comput. Appl..

[27]  Tsuhan Chen,et al.  Malicious web content detection by machine learning , 2010, Expert Syst. Appl..

[28]  Ehud Gudes,et al.  Security Issues in NoSQL Databases , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[29]  J.J. Hopfield,et al.  Artificial neural networks , 1988, IEEE Circuits and Devices Magazine.

[30]  Yang Liu,et al.  JSDC: A Hybrid Approach for JavaScript Malware Detection and Classification , 2015, AsiaCCS.

[31]  Mariano Ceccato,et al.  SOFIA: An automated security oracle for black-box testing of SQL-injection vulnerabilities , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).