An improved method for DNA sequence compression

DNA (deoxyribonucleic acid), is the hereditary material in humans and almost all other organisms. Nearly every cell in a person's body has the same DNA. The information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). With continuous technology development and growth of sequencing data, large amount of biological data is generated. This large amount of generated data causes difficulty to store, analyses and process DNA sequences. Therefore, a wide need of reducing the size, for this reason, DNA Compression is employed to reduce the size of DNA sequence. Therefore, there is a huge need of compressing the DNA sequence. In this paper, we have proposed an efficient and fast DNA sequence compression algorithm based on differential direct coding and variable look up table (LUT).

[1]  Stéphane Grumbach,et al.  Compression of DNA sequences , 1993, [Proceedings] DCC `93: Data Compression Conference.

[2]  Shi Chen,et al.  A DNA sequence compression algorithm based on LUT and LZ77 , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[3]  Ateet Mehta,et al.  DNA COMPRESSION USING HASH BASED DATA STRUCTURE , 2010 .

[4]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[5]  Jing Zhang,et al.  DNA sequences compression algorithms based on the two bits codation method , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  J. Rissanen,et al.  DNA sequence compression - Based on the normalized maximum likelihood model , 2007, IEEE Signal Processing Magazine.

[7]  Jing Zhang,et al.  Vertical DNA Sequences Compression Algorithm Based on Hexadecimal Representation , 2015 .

[8]  Sara Ahmed,et al.  DNA zip codes control an ancient mechanism for gene targeting to the nuclear periphery , 2010, Nature Cell Biology.

[9]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[10]  Jifeng Sun,et al.  K-means clustering based compression algorithm for the high-throughput DNA sequence , 2014, 2014 International Conference on Audio, Language and Image Processing.

[11]  Bin Ma,et al.  DNACompress: fast and effective DNA sequence compression , 2002, Bioinform..

[12]  Gregory Vey Differential direct coding: a compression algorithm for nucleotide sequence data , 2009, Database J. Biol. Databases Curation.

[13]  Govind Prasad A Compression Algorithm for Nucleotide Data Based on Differential Direct Coding and Variable Length Look up Table ( LUT ) , 2012 .