Optimizing Expensive Queries in Complex Event Processing

Pattern queries are widely used in complex event processing (CEP) systems. Existing pattern matching techniques, however, can provide only limited performance for expensive queries in real-world applications, which may involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. To support these expensive queries with high performance, we begin our study by analyzing the complexity of pattern queries, with a focus on the fundamental understanding of which features make pattern queries more expressive and at the same time more computationally expensive. This analysis allows us to identify performance bottlenecks in processing those expensive queries, and provides key insights for us to develop a series of optimizations to mitigate those bottlenecks. Microbenchmark results show superior performance of our system for expensive pattern queries while most state-of-the-art systems suffer from poor performance. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of our proposed techniques.

[1]  Johan Anthory Willem Kamp,et al.  Tense logic and the theory of linear order , 1968 .

[2]  Robert McNaughton,et al.  Counter-Free Automata (M.I.T. research monograph no. 65) , 1971 .

[3]  N. Immerman,et al.  On uniformity within NC 1 . , 1988 .

[4]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[5]  C. Zaniolo,et al.  Expressing and optimizing sequence queries in database systems , 2004, TODS.

[6]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[7]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[8]  Michael J. Franklin,et al.  On-the-fly sharing for streamed aggregation , 2006, SIGMOD Conference.

[9]  Jonathan Goldstein,et al.  Consistent Streaming Through Time: A Vision for Event Stream Processing , 2006, CIDR.

[10]  Theodore Johnson,et al.  Monitoring Regular Expressions on Out-of-Order Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Johannes Gehrke,et al.  Cayuga: A General Purpose Event Monitoring System , 2007, CIDR.

[12]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[13]  Ugur Çetintemel,et al.  Plan-based complex event detection across distributed sources , 2008, Proc. VLDB Endow..

[14]  Andy Konwinski,et al.  Chukwa: A large-scale monitoring system , 2008 .

[15]  Samuel Madden,et al.  ZStream: a cost-based query processor for adaptively detecting composite events , 2009, SIGMOD Conference.

[16]  Peter R. Pietzuch,et al.  Distributed complex event processing with query rewriting , 2009, DEBS '09.

[17]  Matthew O. Ward,et al.  A Shared Execution Strategy for Multiple Pattern Mining Requests over Streaming Data , 2009, Proc. VLDB Endow..

[18]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..

[19]  Neil Immerman,et al.  Recognizing patterns in streams with imprecise timestamps , 2010, Proc. VLDB Endow..

[20]  Heribert Vollmer,et al.  Introduction to Circuit Complexity: A Uniform Approach , 2010 .

[21]  Alessandro Margara,et al.  TESLA: a formally defined event specification language , 2010, DEBS '10.

[22]  D. Luckham Event Processing for Business: Organizing the Real-Time Enterprise , 2011 .

[23]  Chetan Gupta,et al.  High-performance nested CEP query processing over event streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[24]  Elke A. Rundensteiner,et al.  Active Complex Event Processing over Event Streams , 2011, Proc. VLDB Endow..

[25]  Prashant J. Shenoy,et al.  Distributed inference and query processing for RFID tracking and monitoring , 2011, Proc. VLDB Endow..

[26]  Carlo Zaniolo,et al.  High-performance complex event processing over XML streams , 2012, SIGMOD Conference.

[27]  Dave Josephsen,et al.  Monitoring with Ganglia , 2012 .

[28]  Magdalena Balazinska,et al.  Hadoop's Adolescence , 2013, Proc. VLDB Endow..

[29]  Elke A. Rundensteiner,et al.  Probabilistic inference of object identifications for event stream analytics , 2013, EDBT '13.