The Analysis of Firewall Policy Through Machine Learning and Data Mining

Firewalls are primary components for ensuring the network and information security. For this purpose, they are deployed in all commercial, governmental and military networks as well as other large-scale networks. The security policies in an institution are implemented as firewall rules. An anomaly in these rules may lead to serious security gaps. When the network is large and policies are complicated, manual cross-check may be insufficient to detect anomalies. In this paper, an automated model based on machine learning and high performance computing methods is proposed for the detection of anomalies in firewall rule repository. To achieve this, firewall logs are analysed and the extracted features are fed to a set of machine learning classification algorithms including Naive Bayes, kNN, Decision Table and HyperPipes. F-measure, which combines precision and recall, is used for performance evaluation. In the experiments, kNN has shown the best performance. Then, a model based on the F-measure distribution was envisaged. 93 firewall rules were analysed via this model. The model anticipated that 6 firewall rules cause anomaly. These problematic rules were checked against the security reports prepared by experts and each of them are verified to be an anomaly. This paper shows that anomalies in firewall rules can be detected by analysing large scale log files automatically with machine learning methods, which enables avoiding security breaches, saving dramatic amount of expert effort and timely intervention.

[1]  Gail-Joon Ahn,et al.  Detecting and Resolving Firewall Policy Anomalies , 2012, IEEE Transactions on Dependable and Secure Computing.

[2]  Ehab Al-Shaer,et al.  PolicyVis: Firewall Security Policy Visualization and Inspection , 2007, LISA.

[3]  Donato Malerba,et al.  Learning the Daily Model of Network Traffic , 2005, ISMIS.

[4]  David L. Olson,et al.  Advanced Data Mining Techniques , 2008 .

[6]  Vijay Karamcheti,et al.  Exploiting Service Usage Information for Optimizing Server Resource Management , 2006 .

[7]  T. Lacombe,et al.  Classification of varieties for their timing of flowering and veraison using a modelling approach: a case study for the grapevine species Vitis vinifera L. , 2013 .

[8]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[9]  Jakub Breier,et al.  A Dynamic Rule Creation Based Anomaly Detection Method for Identifying Security Breaches in Log Records , 2015, Wireless Personal Communications.

[10]  E. Al-Shaer,et al.  Firewall Policy Advisor for anomaly discovery and rule editing , 2003, IFIP/IEEE Eighth International Symposium on Integrated Network Management, 2003..

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  Marc Rennhard,et al.  Histogram Matrix: Log File Visualization for Anomaly Detection , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[13]  Nils J. Nilsson,et al.  Introduction to Machine Learning , 2020, Machine Learning for iOS Developers.

[14]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[15]  Mohamed G. Gouda,et al.  Structured firewall design , 2007, Comput. Networks.

[16]  Na Chen,et al.  An Experimental Research of Traffic Identification Algorithms in Broadband Network , 2009, 2009 International Symposium on Computer Network and Multimedia Technology.

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Cherie Amon,et al.  The Best Damn Firewall Book Period , 2003 .

[19]  Ehab Al-Shaer,et al.  Analysis of Firewall Policy Rules Using Data Mining Techniques , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[20]  Muhammad Khurram Khan,et al.  Security Analysis of Firewall Rule Sets in Computer Networks , 2010, SECURWARE.

[21]  Tadeusz Pietraszek,et al.  Data mining and machine learning - Towards reducing false positives in intrusion detection , 2005, Inf. Secur. Tech. Rep..

[22]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[23]  Ken Kelley,et al.  Accuracy in Parameter Estimation for Targeted Effects in Structural Equation Modeling: Sample Size Planning for Narrow Confidence Intervals , 2022 .

[24]  Ehab Al-Shaer Managing firewall and network-edge security policies , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[25]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[26]  K. Samsudin,et al.  Evaluation of fall detection classification approaches , 2012, 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012).

[27]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[28]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[29]  Jürgen Schönwälder,et al.  Integrated Network Management VIII , 2003, IFIP — The International Federation for Information Processing.

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[31]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[32]  Zhan Zhang,et al.  Minimizing the Maximum Firewall Rule Set in a Network with Multiple Firewalls , 2010, IEEE Transactions on Computers.

[33]  Gregory D. Peterson,et al.  Parallel application performance on shared high performance reconfigurable computing resources , 2005, Perform. Evaluation.

[34]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[35]  Alex X. Liu Firewall policy change-impact analysis , 2008, TOIT.

[36]  Ehab Al-Shaer,et al.  Conflict classification and analysis of distributed firewall policies , 2005, IEEE Journal on Selected Areas in Communications.

[37]  Andrzej J. Bojarski,et al.  A multidimensional analysis of machine learning methods performance in the classification of bioactive compounds , 2013 .

[38]  Ray Hunt,et al.  Internet/Intranet firewall security - policy, architecture and transaction services , 1998, Comput. Commun..

[39]  Pinar Donmez,et al.  Introduction to Machine Learning, 2nd ed., by Ethem Alpaydın. Cambridge, MA: The MIT Press 2010. ISBN: 978-0-262-01243-0. $54/£ 39.95 + 584 pages , 2013, Nat. Lang. Eng..

[40]  Jacob Eisenstein,et al.  Visual and linguistic information in gesture classification , 2006 .

[41]  Michael J. Chapple,et al.  System Anomaly Detection: Mining Firewall Logs , 2006, 2006 Securecomm and Workshops.

[42]  Mohsen Beheshti,et al.  Analysis of Log Files Intersections for Security Enhancement , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).