Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars

Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology.

[1]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides biochemical and structural information for budding yeast proteins , 2003, Nucleic Acids Res..

[2]  H. G. Khorana Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast , 1971, Nature.

[3]  K. L. Agarwal,et al.  Total Synthesis of the Gene for an Alanine Transfer Ribonucleic Acid from Yeast        , 1970, Nature.

[4]  David Harel,et al.  Beyond the Gene , 2007, PloS one.

[5]  G. Odell,et al.  The segment polarity network is a robust developmental module , 2000, Nature.

[6]  Joseph E. Stoy,et al.  Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory , 1981 .

[7]  J. Collins,et al.  Construction of a genetic toggle switch in Escherichia coli , 2000, Nature.

[8]  Jan van Duin,et al.  Control of Translation by mRNA Secondary Structure in Escherichia coli: A Quantitative Analysis of Literature Data , 1994 .

[9]  J. Collins,et al.  Combinatorial promoter design for engineering noisy gene expression , 2007, Proceedings of the National Academy of Sciences.

[10]  J. Stelling,et al.  A tunable synthetic mammalian oscillator , 2009, Nature.

[11]  Lennart Martens,et al.  Annotating the human proteome: beyond establishing a parts list. , 2007, Biochimica et biophysica acta.

[12]  Maxine Singer,et al.  George Beadle: from genes to proteins , 2004, Nature Reviews Genetics.

[13]  Donald E. Knuth,et al.  The Genesis of Attribute Grammars , 1990, WAGA.

[14]  J. Steel THE TRIPLE HELIX , 2003 .

[15]  M. Elowitz,et al.  Combinatorial Synthesis of Genetic Networks , 2002, Science.

[16]  Katherine C. Chen,et al.  Integrative analysis of cell cycle control in budding yeast. , 2004, Molecular biology of the cell.

[17]  T. D. Schneider,et al.  Anatomy of Escherichia coli ribosome binding sites. , 2001, Journal of molecular biology.

[18]  G. Rose,et al.  Are proteins made from a limited parts list? , 2005, Trends in biochemical sciences.

[19]  Ivan Bratko,et al.  Prolog Programming for Artificial Intelligence , 1986 .

[20]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[21]  Mario Gimona,et al.  Protein linguistics — a grammar for modular protein assembly? , 2006, Nature Reviews Molecular Cell Biology.

[22]  Jean Peccoud,et al.  Gene synthesis demystified. , 2009, Trends in biotechnology.

[23]  Jukka Paakki,et al.  Attribute grammar paradigms—a high-level methodology in language implementation , 1995, CSUR.

[24]  A. Oudenaarden,et al.  Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences , 2008, Cell.

[25]  David B. Searls,et al.  The Linguistics of DNA , 1992 .

[26]  David B. Searls,et al.  Linguistic approaches to biological sequences , 1997, Comput. Appl. Biosci..

[27]  Drew Endy,et al.  Measuring the activity of BioBrick promoters using an in vivo reference standard , 2009, Journal of biological engineering.

[28]  J. Cheverud Genetics and analysis of quantitative traits , 1999 .

[29]  Lily E. Kay,et al.  Who Wrote the Book of Life?: A History of the Genetic Code , 2000 .

[30]  A. Sturtevant,et al.  THE HISTORY OF GENETICS , 1954 .

[31]  Jeffrey C Way,et al.  Designing biological systems. , 2007, Genes & development.

[32]  Jean Peccoud,et al.  A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts , 2007, Bioinform..

[33]  William Bains The parts list of life , 2001, Nature Biotechnology.

[34]  David McMillen,et al.  Biochemical Network Stochastic Simulator (BioNetS): software for stochastic modeling of biochemical networks , 2004, BMC Bioinformatics.

[35]  J. Peccoud,et al.  Targeted Development of Registries of Biological Parts , 2008, PloS one.

[36]  D. Endy,et al.  Refactoring bacteriophage T7 , 2005, Molecular systems biology.

[37]  E. Siggia,et al.  Analysis of Combinatorial cis-Regulation in Synthetic and Genomic Promoters , 2008, Nature.

[38]  Vassilios Sotiropoulos,et al.  SynBioSS: the synthetic biology modeling suite , 2008, Bioinform..

[39]  Jean Peccoud,et al.  Writing DNA with GenoCAD™ , 2009, Nucleic Acids Res..

[40]  Andrew Phillips,et al.  Towards programming languages for genetic engineering of living cells , 2009, Journal of The Royal Society Interface.

[41]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[42]  Jef D Boeke,et al.  Teaching Synthetic Biology, Bioinformatics and Engineering to Undergraduates: The Interdisciplinary Build-a-Genome Course , 2009, Genetics.

[43]  D. Endy Foundations for engineering biology , 2005, Nature.

[44]  David Tollervey,et al.  Coding-Sequence Determinants of Gene Expression in Escherichia coli , 2009, Science.

[45]  J. van Duin,et al.  Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[46]  William H. Sanders,et al.  Dynamic partitioning for hybrid simulation of the bistable HIV-1 transactivation network , 2006, Bioinform..

[47]  Andrew W. Appel,et al.  Modern Compiler Implementation in Java , 1997 .

[48]  M. Bennett,et al.  A fast, robust, and tunable synthetic gene oscillator , 2008, Nature.

[49]  M. Lynch,et al.  Genetics and Analysis of Quantitative Traits , 1996 .

[50]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[51]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[52]  Robert C. Moore Removing Left Recursion from Context-Free Grammars , 2000, ANLP.

[53]  D. Endy,et al.  Refinement and standardization of synthetic biological parts and devices , 2008, Nature Biotechnology.

[54]  Alfonso Jaramillo,et al.  Asmparts: assembly of biological model parts , 2007, Systems and Synthetic Biology.

[55]  D. Searls,et al.  Gene structure prediction by linguistic methods. , 1994, Genomics.

[56]  J. R. Coleman,et al.  Virus Attenuation by Genome-Scale Changes in Codon Pair Bias , 2008, Science.

[57]  Koji Kawabata,et al.  Complete Chemical Synthesis , Assembly , and Cloning of a Mycoplasma genitalium Genome , 2008 .

[58]  Kenneth Slonneger,et al.  Formal syntax and semantics of programming languages - a laboratory based approach , 1995 .

[59]  Gregory Radick,et al.  The Century of the Gene , 2001, Heredity.

[60]  Qi Ouyang,et al.  Stochastic model of coliphage lambda regulatory network. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[61]  S. Bentolila,et al.  A grammar describing 'biological binding operators' to model gene regulation. , 1996, Biochimie.

[62]  Brian W Bramlett,et al.  Genetic design: rising above the sequence. , 2008, Trends in biotechnology.

[63]  Dietrich Rebholz-Schuhmann,et al.  Gene Regulation Ontology (GRO): Design Principles and Use Cases , 2008, MIE.

[64]  Marc Vidal,et al.  ORFeome cloning and systems biology: standardized mass production of the parts from the parts-list. , 2004, Genome research.

[65]  J C Rabinowitz,et al.  The influence of ribosome‐binding‐site elements on translational efficiency in Bacillus subtilis and Escherichia coli in vivo , 1992, Molecular microbiology.

[66]  Eran Segal,et al.  Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes , 2008, Proceedings of the National Academy of Sciences.

[67]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[68]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[69]  Jörg Stelling,et al.  Computational design of synthetic gene circuits with composable parts , 2008, Bioinform..

[70]  M. Elowitz,et al.  A synthetic oscillatory network of transcriptional regulators , 2000, Nature.

[71]  Pedro de Atauri,et al.  Dual feedback loops in the GAL regulon suppress cellular heterogeneity in yeast , 2006, Nature Genetics.

[72]  C. N. Stewart,et al.  Plant functional genomics: beyond the parts list. , 2005, Trends in plant science.

[73]  M. Elowitz,et al.  Programming gene expression with combinatorial promoters , 2007, Molecular systems biology.

[74]  David B. Searls,et al.  Grammatical Representations of Macromolecular Structure , 2006, J. Comput. Biol..

[75]  E. Tatum A case history in biological research. , 1958, Science.

[76]  M. Smit,et al.  Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. , 1990 .

[77]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[78]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[79]  Mudita Singhal,et al.  COPASI - a COmplex PAthway SImulator , 2006, Bioinform..

[80]  Timothy J Mitchison,et al.  Animal cytokinesis: from parts list to mechanisms. , 2006, Annual review of biochemistry.