High-performance complex event processing over XML streams

Much research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range of applications. However, many current CEP systems focus on processing efficiently data having a simple structure, and are otherwise limited in their ability to support efficiently complex continuous queries on structured or semi-structured information. However, XML streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial records, configuration files, and similar applications requiring advanced CEP queries. In this paper, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient implementation. XSeq is designed to take full advantage of recent advances in the field of automata on Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising efficiency (whereas the amenability to efficient implementation was not demonstrated in XPath extensions previously proposed). We illustrate XSeq's power for CEP applications through examples from different domains, and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement are obtained over the same queries executed in general-purpose XML engines.

[1]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[2]  Michael J. Carey,et al.  The BEA/XQRL Streaming XQuery Processor , 2003, VLDB.

[3]  Christoph Koch XML Stream Processing , 2009, Encyclopedia of Database Systems.

[4]  Torsten Grust,et al.  MonetDB/XQuery: a fast XQuery processor powered by a relational engine , 2006, SIGMOD Conference.

[5]  Johannes Gehrke,et al.  Distributed event stream processing with non-deterministic finite automata , 2009, DEBS '09.

[6]  Marcus Fontoura,et al.  Querying XML streams , 2005, The VLDB Journal.

[7]  Susan B. Davidson,et al.  An Efficient XPath Query Processor for XML Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Carsten Lutz,et al.  The complexity of query containment in expressive fragments of XPath 2.0 , 2007, PODS.

[9]  Corin Pitcher Visibly Pushdown Expression Effects for XML Stream Processing , 2004 .

[10]  Sudarshan S. Chawathe,et al.  XPath queries on streaming data , 2003, SIGMOD '03.

[11]  R. Alur,et al.  Adding nesting structure to words , 2006, JACM.

[12]  Carlo Zaniolo,et al.  K*SQL: a unifying engine for sequence patterns and XML , 2010, SIGMOD Conference.

[13]  Marcus Fontoura,et al.  Streaming XPath processing with forward and backward axes , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[14]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[15]  David Luckham,et al.  The power of events - an introduction to complex event processing in distributed enterprise systems , 2002, RuleML.

[16]  Rajeev Alur,et al.  Visibly pushdown languages , 2004, STOC '04.

[17]  Balder ten Cate,et al.  The expressivity of XPath with transitive closure , 2006, PODS.

[18]  Maarten Marx,et al.  Navigational XPath: calculus and algebra , 2007, SGMD.

[19]  Mahesh Viswanathan,et al.  Query Automata for Nested Words , 2009, MFCS.

[20]  Carlo Zaniolo,et al.  Optimizing Regular Expression Clustering for Massive Pattern Search , 2010 .

[21]  Maarten Marx,et al.  Axiomatizing the Logical Core of XPath 2.0 , 2008, Theory of Computing Systems.

[22]  Tim Furche,et al.  OXPath , 2011, Proc. VLDB Endow..

[23]  Carlo Zaniolo,et al.  From regular expressions to nested words , 2010, Proc. VLDB Endow..

[24]  François Bry,et al.  An evaluation of regular path expressions with qualifiers against XML streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[25]  Nguyen Van Tang A Tighter Bound for the Determinization of Visibly Pushdown Automata , 2009, INFINITY.

[26]  Vassilis J. Tsotras,et al.  RoXSum: Leveraging Data Aggregation and Batch Processing for XML Routing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Stefan Schmidt,et al.  An Extension of XQuery for Graph Analysis of Biological Pathways , 2009, 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications.

[28]  Tim Kraska,et al.  XQuery Reloaded , 2009, Proc. VLDB Endow..

[29]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[30]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[31]  Balder ten Cate,et al.  XPath, transitive closure logic, and nested tree walking automata , 2008, PODS.

[32]  Margaret F. Alexander,et al.  Nursing Practice: Hospital and Home : The Adult , 1994 .

[33]  Maarten Marx,et al.  Conditional XPath , 2005, TODS.

[34]  Michael H. Kay Ten Reasons Why Saxon XQuery is Fast , 2008, IEEE Data Eng. Bull..

[35]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[36]  Hao Zhang,et al.  Path sharing and predicate evaluation for high-performance XML filtering , 2003, TODS.