On-line rule matching for event prediction

The prediction of future events has great importance in many applications. The prediction is based on episode rules which are composed of events and two time constraints which require all the events in the episode rule and in the predicate of the rule to occur in a time interval, respectively. In an event stream, a sequence of events which matches the predicate of the rule satisfying the specified time constraint is called an occurrence of the predicate. After finding the occurrence, the consequent event which will occur in a time interval can be predicted. However, the time intervals computed from some occurrences for predicting the event can be contained in the time intervals computed from other occurrence and become redundant. As a result, how to design an efficient and effective event predictor in a stream environment is challenging. In this paper, an effective scheme is proposed to avoid matching the predicate events corresponding to redundant time intervals for prediction. Based on the scheme, we respectively consider two methodologies, forward retrieval and backward retrieval, for the efficient matching of predicate events over event streams. The approach based on forward retrieval construct a queue structure to incrementally maintain parts of the matched results as events arrive, and thus it avoids backward scans of the event stream. On the other hand, the approach based on backward retrieval maintains the recently arrived events in a tree structure. The matching of predicate events is triggered by identifiable events and achieved by an efficient retrieval on the tree structure, which avoids exhaustive scans of the arrived events. By running a series of experiments, we show that each of the proposed approaches has its advantages on particular data distributions and parameter settings.

[1]  Mikhail J. Atallah,et al.  Detection of significant sets of episodes in event sequences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[3]  Arbee L. P. Chen,et al.  Continuously Matching Episode Rules for Predicting Future Events over Event Streams , 2007, APWeb/WAIM.

[4]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[5]  F. Hall TRAFFIC STREAM CHARACTERISTICS , 1997 .

[6]  Heikki Mannila,et al.  Knowledge discovery from telecommunication network alarm databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[7]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[8]  Umeshwar Dayal,et al.  The HiPAC project: combining active databases and timing constraints , 1988, SGMD.

[9]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[10]  Marcos K. Aguilera,et al.  Matching events in a content-based subscription system , 1999, PODC '99.

[11]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[12]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[13]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[14]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[15]  Rainer Unland,et al.  On the semantics of complex events in active database management systems , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[16]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Christine Collet,et al.  Composite Events in NAOS , 1996, DEXA.

[18]  Kai Hwang,et al.  Frequent episode rules for Internet anomaly detection , 2004, Third IEEE International Symposium on Network Computing and Applications, 2004. (NCA 2004). Proceedings..

[19]  Fusheng Wang,et al.  Temporal Management of RFID Data , 2005, VLDB.

[20]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[21]  Narain H. Gehani,et al.  Event specification in an active object-oriented database , 1992, SIGMOD '92.

[22]  Sharma Chakravarthy,et al.  Composite Events for Active Databases: Semantics, Contexts and Detection , 1994, VLDB.

[23]  Jiawei Han,et al.  SeqIndex: Indexing Sequences by Sequential Pattern Analysis , 2005, SDM.

[24]  Ada Wai-Chee Fu,et al.  Mining Frequent Episodes for Relating Financial Events and Stock Trends , 2003, PAKDD.

[25]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[26]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[27]  陳良弼 A Tree-Based Approach for Event Prediction Using Episode Rules over Event Streams , 2008 .

[28]  Michael J. Franklin,et al.  Efficient Filtering of XML Documents for Selective Dissemination of Information , 2000, VLDB.

[29]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[30]  Arbee L. P. Chen Building a Platform for Performance Study of Various Music Information Retrieval Approaches , 2001, ISMIR.

[31]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[32]  P. S. Sastry,et al.  A fast algorithm for finding frequent episodes in event streams , 2007, KDD '07.

[33]  Arbee L. P. Chen,et al.  A Novel Representation of Sequence Data Based on Structural Information for Effective Music Retrieval , 2004, DASFAA.

[34]  Klaus R. Dittrich,et al.  Detecting composite events in active database systems using Petri nets , 1994, Proceedings of IEEE International Workshop on Research Issues in Data Engineering: Active Databases Systems.

[35]  Yanlei Diao,et al.  High-Performance XML Filtering: An Overview of YFilter , 2003, IEEE Data Eng. Bull..

[36]  Narain H. Gehani,et al.  Composite Event Specification in Active Databases: Model & Implementation , 1992, VLDB.

[37]  Frederick Reiss,et al.  Design Considerations for High Fan-In Systems: The HiFi Approach , 2005, CIDR.

[38]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[39]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[40]  Rajeev Rastogi,et al.  Efficient filtering of XML documents with XPath expressions , 2002, Proceedings 18th International Conference on Data Engineering.

[41]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[42]  Mehul A. Shah,et al.  Fault-tolerant, load-balancing queries in telegraph , 2001, SIGMOD '01.

[43]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.