Indexing of Sequences of Sets for Efficient Exact and Similar Subsequence Matching

Object-relational database management systems allow users to define complex data types, such as objects, collections, and nested tables. Unfortunately, most commercially available database systems do not support either efficient querying or indexing of complex attributes. Different indexing schemes for complex data types have been proposed in the literature so far, most of them being application-oriented proposals. The lack of a single universal indexing technique for attributes containing sets and sequences of values significantly hinders practical usability of these data types in user applications. In this paper we present a novel indexing technique for sequence-valued attributes. Our index permits to index not only sequences of values, but sequences of sets of values as well. Experimental evaluation of the index proves the feasibility and benefit of the index in exact and similar matching of subsequences.

[1]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[2]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[3]  Philip S. Yu,et al.  Indexing weighted-sequences in large databases , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Anastasios Kementsietsidis,et al.  Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data , 2001, SIGMOD 2011.

[5]  Man Lung Yiu,et al.  Non-contiguous Sequence Pattern Queries , 2004, EDBT.

[6]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[7]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[8]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[9]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[10]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[11]  Yannis Manolopoulos,et al.  Indexing web access-logs for pattern queries , 2002, WIDM '02.

[12]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[15]  Esko Ukkonen,et al.  Constructing Suffix Trees On-Line in Linear Time , 1992, IFIP Congress.

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[17]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[18]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[19]  David B. Lomet,et al.  Foundations of Data Organization and Algorithms , 1993, Lecture Notes in Computer Science.