Improved hit criteria for DNA local alignment

BackgroundThe hit criterion is a key component of heuristic local alignment algorithms. It specifies a class of patterns assumed to witness a potential similarity, and this choice is decisive for the selectivity and sensitivity of the whole method.ResultsIn this paper, we propose two ways to improve the hit criterion. First, we define the group criterion combining the advantages of the single-seed and double-seed approaches used in existing algorithms. Second, we introduce transition-constrained seeds that extend spaced seeds by the possibility of distinguishing transition and transversion mismatches. We provide analytical data as well as experimental results, obtained with the YASS software, supporting both improvements.ConclusionsProposed algorithmic ideas allow to obtain a significant gain in sensitivity of similarity search without increase in execution time. The method has been implemented in YASS software available at http://www.loria.fr/projects/YASS/.

[1]  Daniel G. Brown,et al.  Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions , 2003, CPM.

[2]  Jignesh M. Patel,et al.  OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences , 2003, VLDB.

[3]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[4]  D. Church,et al.  Cross-species sequence comparisons: a review of methods and available resources. , 2003, Genome research.

[5]  G. Kucherov,et al.  Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Francesca Chiaromonte,et al.  Scoring Pairwise Genomic Sequence Alignments , 2001, Pacific Symposium on Biocomputing.

[7]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[8]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[9]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[10]  Isidore Rigoutsos,et al.  FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Daniel G. Brown,et al.  Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specifity , 2003, WABI.

[12]  Gregory Kucherov,et al.  YASS: Similarity search in DNA sequences , 2003, RECOMB 2003.

[13]  Bin Ma,et al.  On spaced seeds for similarity search , 2004, Discret. Appl. Math..

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Jeremy Buhler,et al.  Designing multiple simultaneous seeds for DNA similarity search , 2004, J. Comput. Biol..

[18]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Yann Ponty,et al.  Estimating seed sensitivity on homogeneous alignments , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[20]  Louxin Zhang,et al.  Good spaced seeds for homology search , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[21]  Jeremy Buhler,et al.  Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[22]  Pavel A. Pevzner,et al.  Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[23]  Louxin Zhang,et al.  Sensitivity analysis and efficient method for identifying optimal spaced seeds , 2004, J. Comput. Syst. Sci..

[24]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[25]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[26]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..