论文信息 - Optimal spaced seeds for faster approximate string matching

Optimal spaced seeds for faster approximate string matching

Filtering is a standard technique for fast approximate string matching in practice.In filtering, a quick first step is used to rule out almost all positions of a text as possible starting positions for a pattern. Typically this step consists of finding the exact matches of small parts of the pattern. In the followup step, a slow method is used to verify or eliminate each remaining position. The running time of such a method depends largely on the quality of the filtering step, as measured by its false positives rate. The quality of such a method depends on the number of true matches that it misses, that is, on its false negative rate.

[1] M. Karpinski,et al. Approximating dense cases of covering problems , 1996, Network Design: Connectivity and Facilities Location.

[2] Pavel A. Pevzner,et al. Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[3] Isidore Rigoutsos,et al. FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4] Jeremy Buhler,et al. Provably sensitive Indexing strategies for biosequence similarity search , 2002, RECOMB '02.

[5] Bin Ma,et al. Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[6] Bin Ma,et al. PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[7] Gad M. Landau,et al. Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[8] Moshe Lewenstein,et al. Faster algorithms for string matching with k mismatches , 2000, SODA '00.

[9] Bin Ma,et al. Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.

[10] Gregory Kucherov,et al. Multi-seed Lossless Filtration (Extended Abstract) , 2004, CPM.

[11] Yann Ponty,et al. Estimating seed sensitivity on homogeneous alignments , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[12] Juha Kärkkäinen,et al. Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[13] Jeremy Buhler,et al. Designing multiple simultaneous seeds for DNA similarity search , 2004, J. Comput. Biol..

[14] Daniel G. Brown,et al. Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specifity , 2003, WABI.

[15] Richard Cole,et al. Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[16] Bin Ma,et al. Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.

[17] Jeremy Buhler,et al. Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[18] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[19] Uzi Vishkin,et al. Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[20] Bin Ma,et al. On spaced seeds for similarity search , 2004, Discret. Appl. Math..

[21] Daniel G. Brown,et al. Vector seeds: An extension to spaced seeds , 2005, J. Comput. Syst. Sci..

[22] G. Kucherov,et al. Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.