Automatic Graph-Based Clustering for Security Logs

Computer security events are recorded in several log files. It is necessary to cluster these logs to discover security threats, detect anomalies, or identify a particular error. A problem arises when large quantities of security log data need to be checked as existing tools do not provide sufficiently sophisticated grouping results. In addition, existing methods need user input parameters and it is not trivial to find optimal values for these. Therefore, we propose a method for the automatic clustering of security logs. First, we present a new graph-theoretic approach for security log clustering based on maximal clique percolation. Second, we add an intensity threshold to the obtained maximal clique to consider the edge weight before proceeds to the percolations. Third, we use the simulated annealing algorithm to optimize the number of percolations and intensity threshold for maximal clique percolation. The entire process is automatic and does not need any user input. Experimental results on various real-world datasets show that the proposed method achieves superior clustering results compared to other methods.

[1]  T. Vicsek,et al.  Weighted network modules , 2007, cond-mat/0703706.

[2]  David Basin,et al.  Logging and Log Analysis , 2011 .

[3]  Adam Baharum,et al.  Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing , 2015, PloS one.

[4]  Frank Harary,et al.  Graph Theory , 2016 .

[5]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Aiko Pras,et al.  SSH Compromise Detection using NetFlow/IPFIX , 2014, CCRV.

[7]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[8]  Umanga Bista,et al.  Intelligent Clustering Scheme for Log Data Streams , 2014, CICLing.

[9]  Ferdous Sohel,et al.  A survey on forensic investigation of operating system logs , 2019, Digit. Investig..

[10]  Ferdous Sohel,et al.  Automatic log parser to support forensic analysis , 2018 .

[11]  Jian Li,et al.  An Evaluation Study on Log Parsing and Its Use in Log Mining , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[12]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[13]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[14]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[17]  Klaus Danzer,et al.  Fuzzy cluster analysis by simulated annealing , 1996 .

[18]  Fergal Reid,et al.  Percolation Computation in Complex Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[19]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[20]  Ferdous Sohel,et al.  Graph clustering and anomaly detection of access control log for forensic purposes , 2017, Digit. Investig..

[21]  Chokchai Leangsuksun,et al.  Baler: deterministic, lossless log message clustering tool , 2011, Computer Science - Research and Development.

[22]  Alioune Ngom,et al.  A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data , 2005, XXV International Conference of the Chilean Computer Science Society (SCCC'05).

[23]  Peter Filzmoser,et al.  Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection , 2018, Comput. Secur..

[24]  Tao Li,et al.  LogSig: generating system events from raw textual logs , 2011, CIKM '11.

[25]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[26]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[27]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[28]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[29]  Kenneth Geisshirt,et al.  Pluggable Authentication Modules , 2007 .

[30]  Tarem Ahmed,et al.  Anomaly Clustering Based on Correspondence Analysis , 2018, 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA).