A Fast Retrieval of DNA Sequences Using Histogram Information

DNA sequence retrieval is a very important topic in bioinformatics algorithm development. However, this task usually spends much computational time to search on large DNA sequence database. This paper presents an efficient hierarchical method to improve the search speed while the accurate is being kept constant. For a given query sequence, firstly, a fast histogram method is used to scan the sequences in the database. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results show the proposed method combining histogram information and Smith-Waterman algorithm is a more efficient algorithm for DNA sequence retrieval.