论文信息 - A General Technique to Improve Filter Algorithms for Approximate String Matching

A General Technique to Improve Filter Algorithms for Approximate String Matching

Approximate string matching searches for occurrences of a pattern in a text, where a certain number of character differences (errors) is allowed. Fast methods use filters: A fast preprocessing phase determines regions of the text where a match cannot occur; only the remaining text regions must be scrutinized by the slower approximate matching algorithm. Such filters can be very effective, but they (naturally) degrade at a critical error threshold. We introduce a general technique to improve the efficiency of filters and hence to push out further this critical threshold value. Our technique intermittently reevaluates the possibility of a match in a given region. It combines precise information about the region already scanned with filtering information about the region yet to be searched. We apply this technique to four approximate string matching algorithms published by Chang & Lawler and Sutinen & Tarhio.

[1] Esko Ukkonen,et al. Approximate Boyer-Moore String Matching , 1993, SIAM J. Comput..

[2] Esko Ukkonen,et al. Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[3] Edward M. McCreight,et al. A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[4] Peter H. Sellers,et al. The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[5] Esko Ukkonen,et al. Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[6] Stefan Kurtz,et al. Fundamental algorithms for a declarative pattern matching system , 1995 .

[7] Eugene L. Lawler,et al. Sublinear approximate string matching and biological applications , 1994, Algorithmica.

[8] Thomas G. Marr,et al. Approximate String Matching and Local Similarity , 1994, CPM.

[9] Tadao Takaoka,et al. Approximate Pattern Matching with Samples , 1994, ISAAC.

[10] David Haussler,et al. A new distance metric on strings computable in linear time , 1988, Discret. Appl. Math..

[11] Gaston H. Gonnet,et al. A new approach to text searching , 1989, SIGIR '89.

[12] Erkki Sutinen,et al. On Using q-Gram Locations in Approximate String Matching , 1995, ESA.