Intelligent Clustering Scheme for Log Data Streams

Mining patterns from the log messages is valuable for real-time analysis and detecting faults, anomaly and security threats. A data-streaming algorithm with an efficient pattern finding approach is more practical way to classify these ubiquitous logs. Thus, in this paper the authors propose a novel online approach for finding patterns in log data sets where a locally sensitive signature is generated for similar log messages. The similarity of these log messages is identified by parsing log messages and then, logically analyzing the signature bit stream associated with them. In addition to that the approach is intelligent enough to reflect the changes when a totally new log appears in the system. The validation of the proposed method is done by comparing F-measure of clustering results for labeled datasets and the word order matched percentage of the log messages in a cluster for unlabeled case with that of SLCT.

[1]  Jon Stearley,et al.  Bridging the Gaps: Joining Information Sources with Splunk , 2010, SLAML.

[2]  Mladen A. Vouk,et al.  Abstracting log lines to log event types for mining software system logs , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[3]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[4]  Camil Demetrescu,et al.  Algorithms for Data Streams , 2008 .

[5]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[6]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[8]  Qingguo Zheng,et al.  Intelligent search of correlated alarms from database containing noise data , 2001, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[9]  Rajeev Motwani,et al.  Hashing, searching, sketching , 2006 .

[10]  Kenji Yamanishi,et al.  Dynamic syslog mining for network failure monitoring , 2005, KDD '05.

[11]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[12]  Junfeng He,et al.  Optimal Parameters for Locality-Sensitive Hashing , 2012, Proceedings of the IEEE.

[13]  Jianmin Wang,et al.  A novel approach for process mining based on event types , 2007, IEEE International Conference on Services Computing (SCC 2007).

[14]  Weiru Liu,et al.  Agwan: A Generative Model for Labelled, Weighted Graphs , 2013, NFMCP.

[15]  Evangelos E. Milios,et al.  LogView: Visualizing Event Log Clusters , 2008, 2008 Sixth Annual Conference on Privacy, Security and Trust.

[16]  Takeshi Shinohara,et al.  Efficient Similarity Search by Reducing I/O with Compressed Sketches , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[17]  Martin Atzmüller,et al.  Mining Complex Event Patterns in Computer Networks , 2012, NFMCP.

[18]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[19]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[20]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[21]  Stephen E. Hansen,et al.  Automated System Monitoring and Notification with Swatch , 1993, LISA.

[22]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).