Survey on index based homology search algorithms

Abstract Up to now, there are many homology search algorithms that have been investigated and studied. However, a good classification method and a comprehensive comparison for these algorithms are absent. This is especially true for index based homology search algorithms. The paper briefly introduces main index construction methods. According to index construction methods, index based homology search algorithms are classified into three categories, i.e., length based index ones, transformation based index ones, and their combination. Based on the classification, the characteristics of the currently popular index based homology search algorithms are compared and analyzed. At the same time, several promising and new index techniques are also discussed. As a whole, the paper provides a survey on index based homology search algorithms.

[1]  Anthony K. H. Tung,et al.  Piers: an efficient model for similarity search in DNA sequence databases , 2004, SGMD.

[2]  Amr El Abbadi,et al.  Efficient filtration of sequence similarity search through singular value decomposition , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[3]  Isidore Rigoutsos,et al.  FLASH: a fast look-up algorithm for string homology , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Malcolm P. Atkinson,et al.  A Database Index to Large Biological Sequences , 2001, VLDB.

[5]  Philippe Dessen,et al.  A rapid access motif database (RAMdb) with a search algorithm for the retrieval patterns in nucleic acids or protein databanks , 1995, Comput. Appl. Biosci..

[6]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[7]  Hugh E. Williams,et al.  Indexing and Retrieval for Genomic Databases , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Beng Chin Ooi,et al.  Fast filter-and-refine algorithms for subsequence selection , 2002, Proceedings International Database Engineering and Applications Symposium.

[9]  Hugh E. Williams Effective Query Filtering for Fast Homology Searching , 1999, Pacific Symposium on Biocomputing.

[10]  U. Hobohm,et al.  A sequence property approach to searching protein databases. , 1995, Journal of molecular biology.

[11]  Carito Guziolowski,et al.  Algorithms for Molecular Biology , 2007 .

[12]  Bertil Schmidt,et al.  High performance biosequence database scanning on reconfigurable platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Ambuj K. Singh,et al.  Efficient Index Structures for String Databases , 2001, VLDB.

[14]  Malcolm P. Atkinson,et al.  Database indexing for large DNA and protein sequence collections , 2002, The VLDB Journal.

[15]  Michael L. Raymer,et al.  Indexing genomic databases , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[16]  Ozgur Ozturk,et al.  Effective indexing and filtering for similarity search in large biosequence databases , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[17]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[18]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Tamer Kahveci,et al.  An Efficient Index Structure for String Databases , 2001 .

[20]  Karl Aberer,et al.  Efficient querying on genomic databases by using metric space indexing techniques , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[21]  Divyakant Agrawal,et al.  Filtration of string proximity search via transformation , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[22]  Hans-Peter Kriegel,et al.  Efficient Similarity Search for Hierarchical Data in Large Databases , 2004, EDBT.

[23]  Tassos Argyros,et al.  Efficient subsequence matching in time series databases under time and amplitude transformations , 2003, Third IEEE International Conference on Data Mining.

[24]  Hugh E. Williams,et al.  Variable-length Intervals in Homology Search , 2004, APBC.

[25]  Torbjørn Rognes,et al.  SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments , 1998, Bioinform..

[26]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[27]  Paul Gardner-Stephen,et al.  A Novel Architecture for Genomic Sequence Searching and Alignment , 2003, Asia-Pacific Computer Systems Architecture Conference.

[28]  Yin-Te Tsai,et al.  An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[29]  Hans-Peter Kriegel,et al.  Similarity Search in Structured Data , 2003, DaWaK.

[30]  Hugh E. Williams,et al.  Indexing Nucleotide Databases for Fast Query Evaluation , 1996, EDBT.

[31]  Ambuj K. Singh,et al.  MAP: Searching Large Genome Databases , 2002, Pacific Symposium on Biocomputing.

[32]  Hao Wang,et al.  Indexing Genomic Databases for Fast Homology Searching , 2002, DEXA.

[33]  Yan Yang,et al.  Partition Based Hierarchical Index for Text Retrieval , 2003, WAIM.

[34]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.