Clustering Events on Streams Using Complex Context Information

Monitoring applications play an increasingly important role in many domains. They detect events in monitored systems and take actions such as invoke a program or notify an administrator. Often administrators must then manually investigate events to figure out the source of a problem. Stream processing engines (SPEs) are general purpose data management systems for monitoring applications. They provide low-latency stream processing but have limited or no support for manual event investigation. In this paper, we propose a new technique for an SPE to support event investigation by automatically classifying events on streams. Unlike previous stream clustering algorithms, our approach takes into account complex user-defined contexts for events. Our approach comprises three key components: an event context data model, a distance measure for event contexts, and an online clustering algorithm for event contexts. We evaluate our approach using synthetic data and show that complex context information can improve online event classification.

[1]  Witold Pedrycz,et al.  Advances in Fuzzy Clustering and its Applications , 2007 .

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Magdalena Balazinska,et al.  Moirae: History-Enhanced Monitoring , 2007, CIDR.

[4]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[5]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[6]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[7]  Aidong Zhang,et al.  ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[10]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[12]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[13]  Philip S. Yu,et al.  CrossClus: user-guided multi-relational clustering , 2007, Data Mining and Knowledge Discovery.

[14]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[15]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[16]  Dietrich Wettschereck,et al.  Relational Instance-Based Learning , 1996, ICML.

[17]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[18]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[19]  Hanan Samet,et al.  Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[22]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[23]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[24]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[25]  Yang Li,et al.  Cascadia: A System for Specifying, Detecting, and Managing RFID Events , 2008, MobiSys '08.

[26]  Mathias Kirsten,et al.  Extending K-Means Clustering to First-Order Representations , 2000, ILP.