Expressing and optimizing sequence queries in database systems

The need to search for complex and recurring patterns in database sequences is shared by many applications. In this paper, we investigate the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems. Thus, we first introduce SQL-TS, an extension of SQL to express these patterns, and then we study how to optimize the queries for this language. We take the optimal text search algorithm of Knuth, Morris and Pratt, and generalize it to handle complex queries on sequences. Our algorithm exploits the interdependencies between the elements of a pattern to minimize repeated passes over the same data. Experimental results on typical sequence queries, such as double bottom queries, confirm that substantial speedups are achieved by our new optimization techniques.

[1]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[2]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[3]  Harry B. Hunt,et al.  Processing Conjunctive Predicates and Queries , 1980, VLDB.

[4]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[5]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[6]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[7]  Lionel M. Ni,et al.  Processing Implication on Queries , 1989, IEEE Transactions on Software Engineering.

[8]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[9]  Narain H. Gehani,et al.  Composite Event Specification in Active Databases: Model & Implementation , 1992, VLDB.

[10]  Hendrik Segers,et al.  Composite event specification in active databases: model and implementation , 1992 .

[11]  Clement T. Yu,et al.  Semantic Query Optimization for Tree and Chain Queries , 1994, IEEE Trans. Knowl. Data Eng..

[12]  Miron Livny,et al.  Sequence query processing , 1994, SIGMOD '94.

[13]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[14]  Richard R. Muntz,et al.  Extracting spatio-temporal patterns from geoscience datasets , 1994, Proceedings of Workshop on Visualization and Machine Vision.

[15]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[17]  Wei Sun,et al.  Solving satisfiability and implication problems in database systems , 1996, TODS.

[18]  Mark Allen Weiss,et al.  On Satisfiability, Equivalence, and Impication Problems Involving Conjunctive Queries in Database Systems , 1996, IEEE Trans. Knowl. Data Eng..

[19]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[20]  Carlo Zaniolo,et al.  Temporal aggregation in active database rules , 1997, SIGMOD '97.

[21]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[22]  Praveen Seshadri,et al.  PREDATOR: a resource for database research , 1998, SGMD.

[23]  Raghu Ramakrishnan,et al.  SRQL: Sorted Relational Query Language , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[24]  Douglas Stott Parker,et al.  SQL/LPP: A Time Series Extension of SQL Based on Limited Patience Patterns , 1999, DEXA.

[25]  Alberto O. Mendelzon,et al.  Querying Time Series Data Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[26]  Carlo Zaniolo,et al.  Using SQL to Build New Aggregates and Extenders for Object- Relational Systems , 2000, VLDB.

[27]  Carlo Zaniolo,et al.  Optimization of sequence queries in database systems , 2001, PODS '01.

[28]  Divesh Srivastava,et al.  Two-dimensional substring indexing , 2001, J. Comput. Syst. Sci..

[29]  Haixun Wang,et al.  The S2-Tree : An Index Structure for Subsequence Matching of Spatial Objects , 2001, PAKDD.

[30]  N. Koudas,et al.  Two-dimensional substring indexing , 2001, PODS '01.

[31]  Jennifer Widom,et al.  An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations , 2002 .

[32]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[33]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[34]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[35]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[36]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .