A DNA assembly model of sentence generation

Recent results of corpus-based linguistics demonstrate that context-appropriate sentences can be generated by a stochastic constraint satisfaction process. Exploiting the similarity of constraint satisfaction and DNA self-assembly, we explore a DNA assembly model of sentence generation. The words and phrases in a language corpus are encoded as DNA molecules to build a language model of the corpus. Given a seed word, the new sentences are constructed by a parallel DNA assembly process based on the probability distribution of the word and phrase molecules. Here, we present our DNA code word design and report on successful demonstration of their feasibility in wet DNA experiments of a small scale.

[1]  Erik Winfree,et al.  Self-assembly of carbon nanotubes into two-dimensional geometries using DNA origami templates. , 2010, Nature nanotechnology.

[2]  Tony McEnery,et al.  Corpus-Based Language Studies: An Advanced Resource Book , 2006 .

[3]  Michael Zuker,et al.  UNAFold: software for nucleic acid folding and hybridization. , 2008, Methods in molecular biology.

[4]  L M Adleman,et al.  Molecular computation of solutions to combinatorial problems. , 1994, Science.

[5]  Gheorghe Paun,et al.  DNA Computing: New Computing Paradigms , 1998 .

[6]  Byoung-Tak Zhang,et al.  Self-Assembling Hypernetworks for Cognitive Learning of Linguistic Memory , 2008 .

[7]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John H. Reif,et al.  Autonomous Programmable Biomolecular Devices Using Self-assembled DNA Nanostructures , 2007, WoLLIC.

[9]  Byoung-Tak Zhang,et al.  Hypernetwork Memory-Based Model for Infant's Language Learning , 2009 .

[10]  Mary Ellen Foster Issues for Corpus-Based Multimodal Generation , 2006 .

[11]  Pamela E. Constantinou,et al.  From Molecular to Macroscopic via the Rational Design of a Self-Assembled 3D DNA Crystal , 2009, Nature.

[12]  Julian M. Pine,et al.  Constructing a Language: A Usage-Based Theory of Language Acquisition. , 2004 .

[13]  J. Reif,et al.  Directed nucleation assembly of DNA tile complexes for barcode-patterned lattices , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Kris Heylen,et al.  Usage-based approaches in Cognitive Linguistics: A technical state of the art , 2005 .

[15]  J. Reif,et al.  Logical computation using algorithmic self-assembly of DNA triple-crossover molecules , 2000, Nature.

[16]  Byoung-Tak Zhang,et al.  Hypernetworks: A Molecular Evolutionary Architecture for Cognitive Learning and Memory , 2008, IEEE Computational Intelligence Magazine.

[17]  Hao Yan,et al.  Directed Nucleation Assembly of Barcode Patterned DNA Lattices , 2003 .

[18]  N. Seeman,et al.  Design and self-assembly of two-dimensional DNA crystals , 1998, Nature.

[19]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[20]  Byoung-Tak Zhang,et al.  DNA Hypernetworks for Information Storage and Retrieval , 2006, DNA.

[21]  Stefan Th. Gries,et al.  What is Corpus Linguistics? , 2009, Lang. Linguistics Compass.

[22]  N. Seeman DNA in a material world , 2003, Nature.

[23]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[24]  D. Geeraerts,et al.  Methodological issues in corpus-based cognitive linguistics , 2008 .

[25]  R. Dirven,et al.  Cognitive sociolinguistics : language variation, cultural models, social systems , 2008 .