On combinatorial DNA word design

We consider the problem of designing DNA codes, namely sets of equi-length words over the alphabet [A, C, G, T] that satisfy certain combinatorial constraints. This problem is motivated by the task of reliably storing and retrieving information in synthetic DNA strands for use in DNA computing or as molecular bar codes in chemical libraries. The primary constraints that we consider, defined with respect to a parameter d, are as follows: for every pair of words w, x in a code, there are at least d mismatches between w and x if w not equal x and also between the reverse of w and the Watson-Crick complement of x. Extending classical results from coding theory, we present several upper and lower bounds on the maximum size of such DNA codes and give methods for constructing such codes. An additional constraint that is relevant to the design of DNA codes is that the free energies and enthalpies of the code words, and thus the melting temperatures, be similar. We describe dynamic programming algorithms that can (a) calculate the total number of words of length n whose free energy value, as approximated by a formula of Breslauer et al. (1986) falls in a given range, and (b) output a random such word. These algorithms are intended for use in heuristic algorithms for constructing DNA codes.

[1]  Max H. Garzon,et al.  Soft molecular computing , 1999, DNA Based Computers.

[2]  Justin Pearson,et al.  Comma-free codes , 2003 .

[3]  Rusell Deaton,et al.  Encoding Genomes for DNA Computing The Molecular Computing Group , 2001 .

[4]  J. Wetmur DNA probes: applications of the principles of nucleic acid hybridization. , 1991, Critical reviews in biochemistry and molecular biology.

[5]  Kalim U. Mir A restricted genetic alphabet for DNA computing , 1996, DNA Based Computers.

[6]  Frank R. Kschischang,et al.  Some ternary and quaternary codes and associated sphere packings , 1992, IEEE Trans. Inf. Theory.

[7]  L F Landweber,et al.  Chess games: a model for RNA based computation. , 1999, Bio Systems.

[8]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Robert E. Kibler Some new constant weight codes (Corresp.) , 1980, IEEE Trans. Inf. Theory.

[10]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[11]  Ronald W. Davis,et al.  Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar–coding strategy , 1996, Nature Genetics.

[12]  Warren D. Smith DNA computers in vitro and vivo , 1995, DNA Based Computers.

[13]  S. Golomb,et al.  Comma-Free Codes , 1958, Canadian Journal of Mathematics.

[14]  Alexander Vardy,et al.  The uniqueness of the Best code , 1994, IEEE Trans. Inf. Theory.

[15]  Willard L. Eastman,et al.  On the construction of comma-free codes , 1965, IEEE Trans. Inf. Theory.

[16]  Eric B. Baum,et al.  DNA sequences useful for computation , 1996, DNA Based Computers.

[17]  Byoung-Tak Zhang,et al.  Molecular Algorithms for Efficient and Reliable DNA Computing , 1998 .

[18]  N. Seeman,et al.  Design and self-assembly of two-dimensional DNA crystals , 1998, Nature.

[19]  Sydney Brenner,et al.  Methods for sorting polynucleotides using oligonucleotide tags , 1997 .

[20]  Lane A. Hemaspaandra,et al.  Using simulated annealing to design good codes , 1987, IEEE Trans. Inf. Theory.

[21]  R. Deaton,et al.  A statistical mechanical treatment of error in the annealing biostep of DNA computation , 1999 .

[22]  Max H. Garzon,et al.  Biomolecular computing and programming , 1999, IEEE Trans. Evol. Comput..

[23]  Alexander Vardy,et al.  Two new bounds on the size of binary codes with a minimum distance of three , 1995, Des. Codes Cryptogr..

[24]  T. Ericson Bounds on the size of a code , 1989 .

[25]  Max H. Garzon,et al.  Good encodings for DNA-based solutions to combinatorial problems , 1996, DNA Based Computers.

[26]  R. Lerner,et al.  Encoded combinatorial chemistry. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[27]  N. Seeman De novo design of sequences for nucleic acid structural engineering. , 1990, Journal of biomolecular structure & dynamics.

[28]  A. Condon,et al.  Demonstration of a word design strategy for DNA computing on surfaces. , 1997, Nucleic acids research.

[29]  Erik Winfree,et al.  A Sticker-Based Model for DNA Computation , 1998, J. Comput. Biol..

[30]  N. J. A. Sloane,et al.  A new table of constant weight codes , 1990, IEEE Trans. Inf. Theory.

[31]  L M Adleman,et al.  Molecular computation of solutions to combinatorial problems. , 1994, Science.