论文信息 - A Consensus Algorithm for Approximate String Matching

A Consensus Algorithm for Approximate String Matching

Abstract Approximate string matching (ASM) is a well-known computational problem with important applications in database searching, plagiarism detection, spelling correction, and bioinformatics. The two main issues with most ASM algorithms are (1) computational complexity, and (2) low specificity due to a large amount of false positives being reported. In this paper, a very efficient ASM method is proposed, along with a post -processing stage designed to significantly reduce the amount of false positives. Results with random strings show that the proposed method is capable of performing a search within a large (1 M b) string in about 100 ms, with a sensitivity and specificity of nearly 100%.

Alfonso Alba | Edgar R. Arce-Santana | Martin O. Mendez | Miguel Rubio | Margarita Rodríguez-Kessler

[1] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2] Ricardo A. Baeza-Yates,et al. Fast and Practical Approximate String Matching , 1996, Inf. Process. Lett..

[3] John M. Howie,et al. Automata and languages , 1991 .

[4] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.

[5] Alexander Meduna,et al. Automata and Languages , 2000, Springer London.

[6] Leonid Boytsov,et al. Indexing methods for approximate dictionary searching: Comparative analysis , 2011, JEAL.

[7] Esko Ukkonen,et al. A Comparison of Approximate String Matching Algorithms , 1996 .

[8] BoytsovLeonid. Indexing methods for approximate dictionary searching , 2011 .

[9] Esko Ukkonen,et al. Algorithms for Approximate String Matching , 1985, Inf. Control..

[10] NavarroGonzalo. A guided tour to approximate string matching , 2001 .

[11] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[12] Jeremy Buhler,et al. Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[13] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[15] Esko Ukkonen,et al. A Comparison of Approximate String Matching Algorithms , 1996, Softw. Pract. Exp..