Mining SQL Queries to Detect Anomalous Database Access using Random Forest and PCA

Data have become a very important asset to many organizations, companies, and individuals, and thus, the security of relational databases that encapsulate these data has become a major concern. Standard database security mechanisms, as well as network-based and host-based intrusion detection systems, have been rendered inept in detecting malicious attacks directed specifically to databases. Therefore, there is an imminent need in developing an intrusion detection system IDS specifically for the database. In this paper, we propose the use of the random forest RF algorithm as the anomaly detection core mechanism, in conjunction with principal components analysis PCA for the task of dimension reduction. Experiments show that PCA produces a very compact, meaningful set of features, while RF, a graphical method that is most likely to exploit the inherent tree-structure characteristic of SQL queries, exhibits a consistently good performance in terms of false positive rate, false negative rate, and time complexity, even with varying number of features.

[1]  Michael Meier,et al.  Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract) , 2009, DIMVA.

[2]  Yi Hu,et al.  A data mining approach for database intrusion detection , 2004, SAC '04.

[3]  Sushil Jajodia,et al.  Mining Malicious Corruption of Data with Hidden Markov Models , 2002, DBSec.

[4]  Javier Bajo,et al.  CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks , 2010, HAIS.

[5]  Arputharaj Kannan,et al.  A genetic-algorithm based neural network short-term forecasting framework for database intrusion prediction system , 2006, Soft Comput..

[6]  Viet H. Huynh,et al.  Process Mining and Security: Visualization in Database Intrusion Detection , 2012, PAISI.

[7]  Hung Q. Ngo,et al.  A Data-Centric Approach to Insider Attack Detection in Database Systems , 2010, RAID.

[8]  Xin Jin,et al.  Architecture for Data Collection in Database Intrusion Detection Systems , 2007, Secure Data Management.

[9]  Elsayed A. Sallam,et al.  A hybrid network intrusion detection framework based on random forests and weighted k-means , 2013 .

[10]  Deshdeepak Shrivastava,et al.  Data Mining Based Database Intrusion Detection System : A Survey * , 2012 .

[11]  Abhinav Srivastava,et al.  Database Intrusion Detection using Weighted Sequence Mining , 2006, J. Comput..

[12]  Arputharaj Kannan,et al.  Intelligent Multi-agent Based Database Hybrid Intrusion Prevention System , 2004, ADBIS.

[13]  Sung-Bae Cho,et al.  A Comparison of Data Mining Techniques for Anomaly Detection in Relational Databases , 2015, ICDS 2015.

[14]  Trevor Hastie,et al.  Tree-Based Methods , 2021, Springer Texts in Statistics.

[15]  Sin Yeung Lee,et al.  Learning Fingerprints for a Database Intrusion Detection System , 2002, ESORICS.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Elisa Bertino,et al.  Detecting anomalous access patterns in relational databases , 2008, The VLDB Journal.

[18]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).