Search domain elimination of genomic databases using a new percentage approximation technique

Search tools for genomic databases have an important role in the development of biotechnology and relevant areas. The Smith and Waterman algorithm (SW) is probably the best precision algorithm for searching genomic databases even though it consumes a great amount of computer power for processing. This paper proposes a new method, which can be attached to SW in order to increase its speed and simultaneously maintain its best accuracy. This approach endeavours to significantly improve the performance of search algorithms in terms of speed and precision. According to the experiment, the PA module can reduce the process time and search domain of SW by 7 to 15 percent and maintain 100 percent accuracy of the original algorithm.

[1]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[2]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[3]  Hugh E. Williams,et al.  Indexing and Retrieval for Genomic Databases , 2002, IEEE Trans. Knowl. Data Eng..

[4]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[5]  Hasan M. Jamil GQL: a reasonable complex SQL for genomic databases , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[6]  Hasan M. Jamil Achieving interoperability of genome databases through intelligent Web mediators , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[7]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[8]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[9]  Anthony K. H. Tung,et al.  The ed-tree: an index for large DNA sequence databases , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[10]  P. Sellers On the Theory and Computation of Evolutionary Distances , 1974 .

[11]  Torbjørn Rognes,et al.  Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. G. Shpaer,et al.  Sensitivity and selectivity in protein similarity searches: a comparison of Smith-Waterman in hardware to BLAST and FASTA. , 1996, Genomics.

[15]  Siu-Ming Yiu,et al.  Approximate string matching in DNA sequences , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[16]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[17]  Yi-Ping Hung,et al.  Fast semi-local alignment for DNA sequence database search , 2002, Object recognition supported by user interaction for service robots.

[18]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.