High Performance Analytics in Complex Event Processing

Complex Event Processing (CEP) is the technical choice for high performance analytics in time-critical decision-making applications. Although current CEP systems support sequence pattern detection on continuous event streams, they do not support the computation of aggregated values over the matched sequences of a query pattern. Instead, aggregation is typically applied as a post processing step after CEP pattern detection, leading to an extremely inefficient solution for sequence aggregation. Meanwhile, the state-ofart aggregation techniques over traditional stream data are not directly applicable in the context of the sequence-semantics of CEP. In this paper, we propose an approach, called A-Seq, that successfully pushes the aggregation computation into the sequence pattern detection process. A-Seq succeeds to compute aggregation online by dynamically recording compact partial sequence aggregation without ever constructing the to-be-aggregated matched sequences. Techniques are devised to tackle all the key CEP-specific challenges for aggregation, including sliding window semantics, event purging, as well as sequence negation. For scalability, we further introduce the Chop-Connect methodology, that enables sequence aggregation sharing among queries with arbitrary substring relationships. Lastly, our cost-driven optimizer selects a shared execution plan for effectively processing a workload of CEP aggregation queries. Our experimental study using real data sets demonstrates over four orders of magnitude efficiency improvement for a wide range of tested scenarios of our proposed A-Seq approach compared to the state-of-art solutions, thus achieving high-performance CEP aggregation analytics.

[1]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[2]  Jiawei Han,et al.  Stream Sequential Pattern Mining with Precise Error Bounds , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  David Maier,et al.  Semantics and evaluation techniques for window aggregates in data streams , 2005, SIGMOD '05.

[4]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[5]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[6]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[7]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[8]  David Wai-Lok Cheung,et al.  OLAP on sequence data , 2008, SIGMOD Conference.

[9]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[11]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[12]  Dennis Shasha,et al.  AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments , 2003, VLDB.

[13]  Matthew O. Ward,et al.  An optimal strategy for monitoring top-k queries in streaming windows , 2011, EDBT/ICDT '11.

[14]  Michael H. Böhlen,et al.  Sequenced spatio-temporal aggregation in road networks , 2009, EDBT '09.

[15]  Elke A. Rundensteiner,et al.  State-slice: new paradigm of multi-query optimization of window-based stream queries , 2006, VLDB.

[16]  Wolfgang Lehner,et al.  Efficient exploitation of similar subexpressions for query processing , 2007, SIGMOD '07.

[17]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[18]  Chetan Gupta,et al.  E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing , 2011, SIGMOD '11.

[19]  Carlo Zaniolo,et al.  Temporal aggregation in active database rules , 1997, SIGMOD '97.

[20]  Chetan Gupta,et al.  NEEL: The Nested Complex Event Language for Real-Time Event Analytics , 2010, BIRTE.

[21]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[22]  C. Zaniolo,et al.  Expressing and optimizing sequence queries in database systems , 2004, TODS.

[23]  Dimitrios Gunopulos,et al.  Temporal Aggregation over Data Streams Using Multiple Granularities , 2002, EDBT.

[24]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[25]  Samuel Madden,et al.  ZStream: a cost-based query processor for adaptively detecting composite events , 2009, SIGMOD Conference.

[26]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).