Unravelling the ORFan Puzzle

ORFans are open reading frames (ORFs) with no detectable sequence similarity to any other sequence in the databases. Each newly sequenced genome contains a significant number of ORFans. Therefore, ORFans entail interesting evolutionary puzzles. However, little can be learned about them using bioinformatics tools, and their study seems to have been underemphasized. Here we present some of the questions that the existence of so many ORFans have raised and review some of the studies aimed at understanding ORFans, their functions and their origins. These works have demonstrated that ORFans are an untapped source of research, requiring further computational and experimental studies.

[1]  D Fischer,et al.  Rational structural genomics: affirmative action for ORFans and the growth in our structural knowledge. , 1999, Protein engineering.

[2]  O. White,et al.  Global transposon mutagenesis and a minimal Mycoplasma genome. , 1999, Science.

[3]  Peer Bork,et al.  Comparative Genome and Proteome Analysis of Anopheles gambiae and Drosophila melanogaster , 2002, Science.

[4]  Shlomo Havlin,et al.  Scaling law in sizes of protein sequence families: From super‐families to orphan genes , 2003, Proteins.

[5]  Jonathan A. Eisen,et al.  Microbial genome sequencing , 2000, Nature.

[6]  C. DeLisi,et al.  Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. , 2000, Physical review letters.

[7]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[8]  David Baker,et al.  We need both computer models and experiments , 2001, Nature.

[9]  F Lopez,et al.  Reverse transcriptase-polymerase chain reaction validation of 25 "orphan" genes from Escherichia coli K-12 MG1655. , 2000, Genome research.

[10]  S. Andersson,et al.  Microbial genome evolution: sources of variability. , 2002, Current opinion in microbiology.

[11]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[12]  C. Sander,et al.  Genome sequences and great expectations , 2000, Genome Biology.

[13]  S. Casjens,et al.  Where are the pseudogenes in bacterial genomes? , 2001, Trends in microbiology.

[14]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[15]  C. Aquadro,et al.  The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. , 2001, Genetics.

[16]  J. Andersson,et al.  Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. , 2001, Molecular biology and evolution.

[17]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Eugene V Koonin,et al.  Microevolutionary genomics of bacteria. , 2002, Theoretical population biology.

[19]  B. Bloom On the particularity of pathogens , 2000, Nature.

[20]  Steven E. Brenner,et al.  Target selection for structural genomics , 2000, Nature Structural Biology.

[21]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[22]  D J Lipman,et al.  Lineage-specific loss and divergence of functionally linked genes in eukaryotes. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Benjamin L. King,et al.  Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori , 1999, Nature.

[24]  D. Petrov,et al.  Evidence for DNA loss as a determinant of genome size. , 2000, Science.

[25]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[26]  E. Koonin,et al.  Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. , 2002, Genome research.

[27]  M. Huynen,et al.  The frequency distribution of gene family sizes in complete genomes. , 1998, Molecular biology and evolution.

[28]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[29]  D. Fischer,et al.  Analysis of singleton ORFans in fully sequenced microbial genomes , 2003, Proteins.

[30]  Y. Chirgadze,et al.  Spatial sign-alternating charge clusters in globular proteins. , 1999, Protein engineering.

[31]  Anton J. Enright,et al.  Myriads of protein families, and still counting , 2003, Genome Biology.

[32]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[33]  C. Sander,et al.  Characterization of new proteins found by analysis of short open reading frames from the full yeast genome , 1997, Yeast.

[34]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 19. Ascomycetes‐specific genes , 2000, FEBS letters.

[35]  P. Thuluvath,et al.  Association between hepatitis C, diabetes mellitus, and race. a case-control study. , 2003 .

[36]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[37]  M. Gerstein,et al.  Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome. , 2000, Nucleic acids research.

[38]  Burkhard Rost,et al.  Did evolution leap to create the protein universe? , 2002, Current opinion in structural biology.

[39]  M. Hattori,et al.  Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[40]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[41]  Wen-Hsiung Li,et al.  Molecular evolution meets the genomics revolution , 2003, Nature Genetics.

[42]  W. Doolittle,et al.  Microbial genomes: dealing with diversity. , 2001, Current opinion in microbiology.

[43]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[44]  B. Wren Microbial genome analysis: insights into virulence, host adaptation and evolution , 2000, Nature Reviews Genetics.

[45]  R. Doolittle A bug with excess gastric avidity , 1997, Nature.

[46]  Laurence D. Hurst,et al.  Do essential genes evolve slowly? , 1999, Current Biology.

[47]  Howard Ochman,et al.  Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. , 2002, Trends in genetics : TIG.

[48]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[49]  E V Koonin Computational genomics , 2001, Current Biology.

[50]  C Abergel,et al.  Escherichia coli ykfE ORFan Gene Encodes a Potent Inhibitor of C-type Lysozyme* , 2001, The Journal of Biological Chemistry.

[51]  Daniel Fischer,et al.  Twenty thousand ORFan microbial protein families for the biologist? , 2003, Structure.

[52]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[53]  A. E. Hirsh,et al.  Protein dispensability and rate of evolution , 2001, Nature.

[54]  B. Barrell,et al.  A Re-Annotation of the Saccharomyces Cerevisiae Genome , 2001, Comparative and functional genomics.

[55]  S. Cebrat,et al.  Origin and properties of non-coding ORFs in the yeast genome. , 1999, Nucleic acids research.

[56]  T O Yeates,et al.  Searching for frameshift evolutionary relationships between protein sequence families , 1999, Proteins.

[57]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[58]  D. Eisenberg,et al.  Crystal structure of a major secreted protein of Mycobacterium tuberculosis—MPT63 at 1.5‐Å resolution , 2002, Protein science : a publication of the Protein Society.

[59]  E. Koonin,et al.  Scale-free networks in biology: new insights into the fundamentals of evolution? , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.