Some possible codes for encrypting data in DNA

Three codes are reported for storing written information in DNA. We refer to these codes as the Huffman code, the comma code and the alternating code. The Huffman code was devised using Huffman's algorithm for constructing economical codes. The comma code uses a single base to punctuate the message, creating an automatic reading frame and DNA which is obviously artificial. The alternating code comprises an alternating sequence of purines and pyrimidines, again creating DNA that is clearly artificial. The Huffman code would be useful for routine, short-term storage purposes, supposing – not unrealistically – that very fast methods for assembling and sequencing large pieces of DNA can be developed. The other two codes would be better suited to archiving data over long periods of time (hundreds to thousands of years).

[1]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[2]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[3]  F H Crick,et al.  CODES WITHOUT COMMAS. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. Stemmer,et al.  Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. , 1995, Gene.

[5]  B. Sauer,et al.  Multiplex Cre/lox recombination permits selective site-specific DNA targeting to both a natural and an engineered site in the yeast genome. , 1996, Nucleic acids research.

[6]  A. Doig Improving the efficiency of the genetic code by varying the codon length--the perfect genetic code. , 1997, Journal of theoretical biology.

[7]  J G Daugman,et al.  Information Theory and Coding , 2005 .

[8]  Simon Singh,et al.  The Code Book , 1999 .

[9]  M. Grainger,et al.  PCR-based gene synthesis as an efficient approach for expression of the A+T-rich malaria genome. , 1999, Protein engineering.

[10]  E. Vermaas,et al.  In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J P Cox,et al.  Long-term data storage in DNA. , 2001, Trends in biotechnology.

[12]  C Bancroft,et al.  Long-Term Storage of Information in DNA , 2001, Science.

[13]  A. Paul,et al.  Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template , 2002, Science.

[14]  Pak Chung Wong,et al.  Organic data memory using the DNA approach , 2003, CACM.

[15]  John H. Reif,et al.  DNA-based Cryptography , 1999, Aspects of Molecular Computing.