Client Side Decompression Technique Provides Faster DNA Sequence Data Delivery

DNA sequences are generally very long chains of sequentially linked nucleotides. There are four different nucleotides and combinations of these build the nucleotide information of sequence files contained in data sources. When a user searches for any sequence for an organism, a compressed sequence file can be sent from the data source to the user. The compressed file then can be decompressed at the client end resulting in reduced transmission time over the Internet. A compression algorithm that provides a moderately high compression rate with minimal decompression time is proposed in this paper. We also compare a number of different compression techniques for achieving efficient delivery methods from an intelligent genomic search agent over the Internet

[1]  Jean-Paul Delahaye,et al.  Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences , 1997, Comput. Appl. Biosci..

[2]  Trevor I. Dix,et al.  Compression of Strings with Approximate Repeats , 1998, ISMB.

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[5]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[6]  Bin Ma,et al.  DNACompress: fast and effective DNA sequence compression , 2002, Bioinform..

[7]  Jean-Paul Delahaye,et al.  A guaranteed compression scheme for repetitive DNA sequences , 1996, Proceedings of Data Compression Conference - DCC '96.

[8]  Toshiko Matsumoto,et al.  Biological sequence compression algorithms. , 2000, Genome informatics. Workshop on Genome Informatics.

[9]  D R Powell,et al.  Discovering simple DNA sequences by compression. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[10]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[11]  O. Delgrange,et al.  Fast Discerning Repeats in DNA Sequences with a Compression Algorithm , 1997 .

[12]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[13]  Lei Chen,et al.  Compressed pattern matching in DNA sequences , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[14]  Akihiko Konagaya,et al.  DNA Data Compression in the Post Genome Era , 2001 .

[15]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[16]  Yong Zhang,et al.  DNA sequence compression using the Burrows-Wheeler Transform , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[17]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[18]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[19]  Ioan Tabus,et al.  DNA sequence compression using the normalized maximum likelihood model for discrete regression , 2003, Data Compression Conference, 2003. Proceedings. DCC 2003.