Crick's Hypothesis Revisited: The Existence of a Universal Coding Frame

In 1957 Crick hypothesized that the genetic code was a comma free code. This property would imply the existence of a universal coding frame and make the set of coding sequences a locally testable language. As the link between nucleotides and amino acids became better understood, it appeared clearly that the genetic code was not comma free. Crick then adopted a radically different hypothesis: the "frozen accident". However, the notions of comma free codes and locally testable languages are now playing a role in DNA Computing, while circular codes have been found as subsets of the genetic code. We revisit Crick's 1957 hypothesis in that context. We show that coding sequences from a wide variety of genes from the three domains, eukaryotes, prokaryotes and archaea, have a property of testable by fragments, which is an adaptation of the notion of local testability to DNA sequences. These results support the existence of a universal coding frame, as the frame of a coding sequence can be determined from one of its fragments, independently from the gene or the organism the coding sequence comes from.

[1]  F. Crick,et al.  A speculation on the origin of protein synthesis , 2004, Origins of life.

[2]  Kesav V. Nori,et al.  Foundations of Software Technology and Theoretical Computer Science , 1991, Lecture Notes in Computer Science.

[3]  Brian Hayes,et al.  THE INVENTION OF THE GENETIC CODE , 1998 .

[4]  D. Arquès,et al.  Identification of protein coding genes in genomes with statistical functions based on the circular code. , 2002, Bio Systems.

[5]  L F Landweber,et al.  Selection, history and chemistry: the three faces of the genetic code. , 1999, Trends in biochemical sciences.

[6]  F H Crick,et al.  CODES WITHOUT COMMAS. , 1957, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Grzegorz Rozenberg,et al.  Aspects of molecular computing : essays dedicated to Tom Head on the occasion of his 70th birthday , 2004 .

[8]  Leann Steve Foundations of Software Technology and Theoretical Computer Science , 1992, Lecture Notes in Computer Science.

[9]  C J Michel,et al.  A code in the protein coding genes. , 1997, Bio Systems.

[10]  Gabriel Frey,et al.  Circular codes in archaeal genomes. , 2003, Journal of theoretical biology.

[11]  J. Sampson selection , 2006, Algorithm Design with Haskell.

[12]  C J Michel,et al.  A complementary circular code in the protein coding genes. , 1996, Journal of theoretical biology.

[13]  Tom Head,et al.  Splicing Representations of Strictly Locally Testable Languages , 1998, Discret. Appl. Math..

[14]  Jean-Louis Lassez,et al.  Similarity Based Classification , 2003, IDA.

[15]  Christian Borgelt,et al.  Advances in Intelligent Data Analysis V , 2003, Lecture Notes in Computer Science.

[16]  Rani Siromoney,et al.  How to Compute with DNA , 1999, FSTTCS.

[17]  Jean-Louis Lassez,et al.  On the structure of systematic prefix codes , 1972 .

[18]  Jean-Louis Lassez Circular codes and synchronization , 2004, International Journal of Computer & Information Sciences.

[19]  Christian J. Michel,et al.  Identification of circular codes in bacterial genomes and their use in a factorization method for retrieving the reading frames of genes , 2006, Comput. Biol. Chem..