Natural event summarization

Event mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization seeks to provide a comprehensible explanation of the event sequence on certain aspects. Previous methods have several limitations such as ignoring temporal information, generating the same set of boundaries for all event patterns, and providing a summary which is difficult for human to understand. In this paper, we propose a novel framework called natural event summarization that summarizes an event sequence using inter-arrival histograms to capture the temporal relationship among events. Our framework uses the minimum description length principle to guide the process in order to balance between accuracy and brevity. Also, we use multi-resolution analysis for pruning the problem space. We demonstrate how the principles can be applied to generate summaries with periodic patterns and correlation patterns in the framework. Experimental results on synthetic and real data show our method is capable of producing usable event summary, robust to noises, and scalable.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[3]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[5]  Stephen R. Marsland,et al.  A minimum description length objective function for groupwise non-rigid image registration , 2008, Image Vis. Comput..

[6]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[7]  Joseph L. Hellerstein,et al.  Predictive algorithms in the management of computer systems , 2002, IBM Syst. J..

[8]  Changjie Tang,et al.  An MDL approach to efficiently discover communities in bipartite network , 2010, Journal of Central South University.

[9]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[10]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998 .

[11]  Evimaria Terzi,et al.  Constructing comprehensive summaries of large event sequences , 2009, TKDD.

[12]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[13]  Michal Aharon,et al.  One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs , 2009, ECML/PKDD.

[14]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[15]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[16]  Haixun Wang,et al.  An algorithmic approach to event summarization , 2010, SIGMOD Conference.

[17]  Nippon Telegraph,et al.  Finding natural clusters having minimum description length , 1990 .

[18]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[19]  Joseph L. Hellerstein,et al.  Data-driven validation, completion and construction of event relationship networks , 2003, KDD '03.

[20]  Pak Chung Wong,et al.  TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system , 1998, Proceedings Visualization '98 (Cat. No.98CB36276).

[21]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Sung-Hyuk Cha,et al.  On measuring the distance between histograms , 2002, Pattern Recognit..

[23]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[24]  Wei Peng,et al.  An integrated framework on mining logs files for computing system management , 2005, KDD '05.

[25]  Padhraic Smyth,et al.  Pattern discovery in sequences under a Markov assumption , 2002, KDD.

[26]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[28]  Heikki Mannila,et al.  Finding simple intensity descriptions from event sequence data , 2001, KDD '01.

[29]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Wei Peng,et al.  Event summarization for system management , 2007, KDD '07.