Matching and indexing sequences of different lengths

In this paper, we consider the problem of efficient matching and retrieval of sequences of different lengths. Most of the previous research is concentrated on similarity matching and retrieval of sequences of the same length using Euclidean distance metric. For similarity matching of sequences, we use a modified version of the edit distance function, and consider two sequences matching if a majority of the elements in the sequences match. In the matching process a mapping among non-matching elements is created to check if there are unacceptable deviations among them. This means that two matching sequences should have lengths that are comparable. For efficient retrieval of matching sequences, we propose an indexing scheme which is totally based on lengths and relative distances between sequences. We use vp-trees as the underlying distance-based index structures in our method.

[1]  Eric Baer,et al.  Analysis of the wedge-shaped damage zone in edge-notched polypropylene , 1992 .

[2]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[3]  Daniel P. Lopresti,et al.  Block Edit Models for Approximate String Matching , 1997, Theor. Comput. Sci..

[4]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[6]  Nasser Yazdani,et al.  Sequence matching of images , 1996, Proceedings of 8th International Conference on Scientific and Statistical Data Base Management.

[7]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[8]  Andrew Tomkins,et al.  On the Searchability of Electronic Ink , 1994 .

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[11]  Gultekin Özsoyoglu,et al.  A framework for feature-based indexing for spatial databases , 1994, Seventh International Working Conference on Scientific and Statistical Database Management.

[12]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[13]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[14]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..