On the Design of Codes for DNA Computing

In this paper, we describe a broad class of problems arising in the context of designing codes for DNA computing. We primarily focus on design considerations pertaining to the phenomena of secondary structure formation in single-stranded DNA molecules and non-selective cross-hybridization. Secondary structure formation refers to the tendency of single-stranded DNA sequences to fold back upon themselves, thus becoming inactive in the computation process, while non-selective cross-hybridization refers to unwanted pairing between DNA sequences involved in the computation process. We use the Nussinov-Jacobson algorithm for secondary structure prediction to identify some design criteria that reduce the possibility of secondary structure formation in a codeword. These design criteria can be formulated in terms of constraints on the number of complementary pair matches between a DNA codeword and some of its shifts. We provide a sampling of simple techniques for enumerating and constructing sets of DNA sequences with properties that inhibit non-selective hybridization and secondary structure formation. Novel constructions of such codes include using cyclic reversible extended Goppa codes, generalized Hadamard matrices, and a binary mapping approach. Cyclic code constructions are particularly useful in light of the fact we prove that the presence of a cyclic structure reduces the complexity of testing DNA codes for secondary structure formation.

[1]  Olgica Milenkovic,et al.  Support Weight Enumerators and Coset Weight Distributions of Isodual Codes , 2005, Des. Codes Cryptogr..

[2]  Erik Winfree DNA Computing by Self-Assembly , 2003 .

[3]  Anthony J. Macula,et al.  DNA sequences and quaternary cyclic codes , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[4]  E. Shapiro,et al.  An autonomous molecular computer for logical control of gene expression , 2004, Nature.

[5]  Masud Mansuripur,et al.  Information storage and retrieval using macromolecules as storage media , 2003, Optical Data Storage.

[6]  C. H. Cooke,et al.  Polynomial construction of complex Hadamard matrices with cyclic core , 1999 .

[7]  Clifford R. Johnson,et al.  Solution of a 20-Variable 3-SAT Problem on a DNA Computer , 2002, Science.

[8]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Amit Marathe,et al.  On combinatorial DNA word design , 1999, DNA Based Computers.

[10]  Ronald W. Davis,et al.  Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar–coding strategy , 1996, Nature Genetics.

[11]  Kenneth K. Tzeng,et al.  On extending Goppa codes to cyclic codes (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[12]  Patric R. J. Östergård,et al.  Bounds and constructions for ternary constant-composition codes , 2002, IEEE Trans. Inf. Theory.

[13]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[14]  Pavel A. Vilenkin,et al.  New results on DNA codes , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[15]  Oliver D. King,et al.  Bounds for DNA Codes with Constant GC-Content , 2003, Electron. J. Comb..

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[17]  Navin Kashyap,et al.  DNA codes that avoid secondary structures , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[18]  Péter L. Erdös,et al.  Exordium for DNA Codes , 2003, J. Comb. Optim..

[19]  S.A. Tsaftaris,et al.  DNA computing from a signal processing viewpoint , 2004, IEEE Signal Processing Magazine.

[20]  L M Adleman,et al.  Molecular computation of solutions to combinatorial problems. , 1994, Science.

[21]  O. Antoine,et al.  Theory of Error-correcting Codes , 2022 .

[22]  R. Nussinov,et al.  Fast algorithm for predicting the secondary structure of single-stranded RNA. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Simon Levin Computational Molecular Biology An Introduction , 2000 .

[24]  I. Goulden,et al.  Combinatorial Enumeration , 2004 .

[25]  Richard J. Lipton,et al.  Breaking DES using a molecular computer , 1995, DNA Based Computers.

[26]  Oliver D. King,et al.  Linear constructions for DNA codes , 2005, Theor. Comput. Sci..

[27]  Darko Stefanovic,et al.  A deoxyribozyme-based molecular automaton , 2003, Nature Biotechnology.