Amino Acid Classification and Hash Seeds for Homology Search

Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on k -mers, where two k -mers have the same hash value if and only if they are identical at the w (w < k ) positions designated by the seed. Spaced seeds substantially increased the homology search sensitivity. It is then a natural question to ask whether there is a better hash function (called hash seed ) that provides better sensitivity than the spaced seed. We study this question in the paper. We propose a strategy to classify amino acids, which leads to a better hash seed. Our results raise a new question about how to design the best hash seed.

[1]  François Nicolas,et al.  Hardness of optimal spaced seed design , 2005, J. Comput. Syst. Sci..

[2]  Louxin Zhang,et al.  Good spaced seeds for homology search , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[3]  Webb Miller,et al.  A space-efficient algorithm for local similarities , 1990, Comput. Appl. Biosci..

[4]  Louxin Zhang,et al.  Superiority and complexity of the spaced seeds , 2006, SODA '06.

[5]  Daniel G. Brown,et al.  Optimal Spaced Seeds for Hidden Markov Models, with Application to Homologous Coding Regions , 2003, CPM.

[6]  Daniel G. Brown,et al.  Optimal Spaced Seeds for Homologous Coding Regions , 2004, J. Bioinform. Comput. Biol..

[7]  Bin Ma,et al.  ZOOM! Zillions of oligos mapped , 2008, Bioinform..

[8]  Bin Ma,et al.  On spaced seeds for similarity search , 2004, Discret. Appl. Math..

[9]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[10]  Daniel G. Brown,et al.  Multiple Vector Seeds for Protein Alignment , 2004, WABI.

[11]  Lucian Ilie,et al.  Fast Computation of Good Multiple Spaced Seeds , 2007, WABI.

[12]  Daniel G. Brown,et al.  Vector seeds: An extension to spaced seeds , 2005, J. Comput. Syst. Sci..

[13]  Daniel G. Brown,et al.  Vector Seeds: An Extension to Spaced Seeds Allows Substantial Improvements in Sensitivity and Specifity , 2003, WABI.

[14]  Daniel G. Brown,et al.  A Survey of Seeding for Sequence Alignment , 2007 .

[15]  Jeremy Buhler,et al.  Designing seeds for similarity search in genomic DNA , 2003, RECOMB '03.

[16]  Pavel A. Pevzner,et al.  Multiple filtration and approximate pattern matching , 1995, Algorithmica.

[17]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[18]  G. Kucherov,et al.  Multiseed lossless filtration , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Erkki Sutinen,et al.  Experiments on Block Indexing , 2006 .

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[21]  Louxin Zhang,et al.  Sensitivity analysis and efficient method for identifying optimal spaced seeds , 2004, J. Comput. Syst. Sci..

[22]  Bin Ma,et al.  Rapid Homology Search with Neighbor Seeds , 2007, Algorithmica.

[23]  Miklós Csürös,et al.  Performing Local Similarity Searches with Variable Length Seeds , 2004, CPM.

[24]  Alexander Zelikovsky,et al.  Bioinformatics Algorithms: Techniques and Applications , 2008 .

[25]  Bin Ma,et al.  Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.

[26]  Yann Ponty,et al.  Estimating seed sensitivity on homogeneous alignments , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[27]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[28]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[29]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[30]  Kun-Mao Chao,et al.  Efficient methods for generating optimal single and multiple spaced seeds , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[31]  Bin Ma,et al.  On the complexity of the spaced seeds , 2007, J. Comput. Syst. Sci..

[32]  Miklós Csűrös,et al.  Performing Local Similarity Searches with Variable Length Seeds , 2004 .

[33]  Bin Ma,et al.  A Tutorial of Recent Developments in the Seeding of Local Alignment , 2004, J. Bioinform. Comput. Biol..

[34]  Jeremy Buhler,et al.  Designing multiple simultaneous seeds for DNA similarity search , 2004, J. Comput. Biol..

[35]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.