A Compression & Encryption Algorithm on DNA Sequences Using Dynamic Look up Table and Modified Huffman Techniques

Storing, transmitting and security of DNA sequences are well known research challenge. The problem has got magnified with increasing discovery and availability of DNA sequences. We have represent DNA sequence compression algorithm based on Dynamic Look Up Table (DLUT) and modified Huffman technique. DLUT consists of 4 3 (64) bases that are 64 sub-stings, each sub-string is of 3 bases long. Each sub-string are individually coded by single ASCII code from 33(!) to 96(`) and vice versa. Encode depends on encryption key choose by user from four base pair {a,t.g and c}and decode also require decryption key provide by the encoded user. Decoding must require authenticate input for encode the data. The sub-strings are combined into a Dynamic Look up Table based pre-coding routine. This algorithm is tested on reverse; complement & reverse complement the DNA sequences and also test on artificial DNA sequences of equivalent length. Speed of encryption and security levels are two important measurements for evaluating any encryption system. Due to pro liferate of ubiquitous computing system, where d igital contents are accessible through resource constraint biological database security concern is very important issue. A lot of research has been made to find an encryption system which can be run effectively in those biological databases. Informat ion security is the most challenging question to protect the data from unauthorized user. The proposed method may protect the data from hackers. It can provide the three tier security, in tier one is ASCII code, in t ier two is nucleotide (a,t,g and c) choice by user and tier three is change of label or change of node position in Huffman Tree. Compression of the genome sequences will help to increase the efficiency of their use. The greatest advantage of this algorithm is fast execution, small memory occupation and easy implementation. Since the program to implement the technique have been written originally in the C language, (Windows XP platform, and TC compiler) it is possible to run in other microcomputers with s mall changes (depending on platform and Compiler used). The execution is quite fast, all the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. The technique can approach an effective compression ratio of 1.98 bits/base and even lower. When a user searches for any sequence for an organism, an encrypted compressed sequence file can be sent from the data source to the user. The encrypted compressed file then can be decrypted & decompressed at the client end resulting in reduced transmission time over the Internet. An encrypt compression algorithm that provides a moderately high compression with encryption rate with minimal decryption with decompression time.

[1]  En-Hui Yang,et al.  Estimating DNA sequence entropy , 2000, SODA '00.

[2]  Robert E. Tarjan,et al.  A Locally Adaptive Data , 1986 .

[3]  H. Imai,et al.  Biological Sequence Compression Algorithms 1 Biological Sequence Compression Algorithms , 2000 .

[4]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[5]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[6]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[7]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[8]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[9]  David Loewenstern,et al.  Significantly Lower Entropy Estimates for Natural DNA Sequences , 1999, J. Comput. Biol..

[10]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[11]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  Adam Drozdek Elements Of Data Compression , 2001 .

[13]  Lei Chen,et al.  Compressed pattern matching in DNA sequences , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[14]  Jean-Paul Delahaye,et al.  Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences , 1997, Comput. Appl. Biosci..

[15]  T. Kirkwood,et al.  Statistical Analysis of Deoxyribonucleic Acid Sequence Data-a Review , 1989 .

[16]  Sam Kwong,et al.  A Compression Algorithm for DNA Sequences Using Approximate Matching for Better Compression Ratio to Reveal the True Characteristics of DNA , 2001 .

[17]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .