Poorly conserved ORFs in the genome of the archaea Halobacterium sp. NRC-1 correspond to expressed proteins

MOTIVATION A large fraction of open reading frames (ORFs) identified as 'hypothetical' proteins correspond to either 'conserved hypothetical' proteins, representing sequences homologous to ORFs of unknown function from other organisms, or to hypothetical proteins lacking any significant sequence similarity to other ORFs in the databases. Elucidating the functions and three-dimensional structures of such orphan ORFs, termed ORFans or poorly conserved ORFs (PCOs), is essential for understanding biodiversity. However, it has been claimed that many ORFans may not encode for expressed proteins. RESULTS A genome-wide experimental study of 'paralogous PCOs' in the halophilic archaea Halobacterium sp. NRC-1 was conducted. Paralogous PCOs are ORFs with at least one homolog in the same organism, but with no clear homologs in other organisms. The results reveal that mRNA is synthesized for a majority of the Halobacterium sp. NRC-1 paralogous PCO families, including those comprising relatively short proteins, strongly suggesting that these Halobacterium sp. NRC-1 paralogous PCOs correspond to true, expressed proteins. Hence, further computational and experimental studies aimed at characterizing PCOs in this and other organisms are merited. Such efforts could shed light on PCOs' functions and origins, thereby serving to elucidate the vast diversity observed in the genetic material.

[1]  Howard Ochman,et al.  Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. , 2002, Trends in genetics : TIG.

[2]  S. Oliver From DNA sequence to biological function , 1996, Nature.

[3]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[4]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[5]  Russell F. Doolittle,et al.  Biodiversity: Microbial genomes multiply , 2002, Nature.

[6]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[7]  M. Saraste,et al.  FEBS Lett , 2000 .

[8]  H. Scholz,et al.  Natrialba magadii virus φCh1: first complete nucleotide sequence and functional organization of a virus infecting a haloalkaliphilic archaeon , 2002, Molecular microbiology.

[9]  B. Dujon,et al.  Genomic Exploration of the Hemiascomycetous Yeasts: 19. Ascomycetes‐specific genes , 2000, FEBS letters.

[10]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[11]  Daniel W. A. Buchan,et al.  A structural perspective on genome evolution. , 2003 .

[12]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[13]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[14]  C. Sander,et al.  Characterization of new proteins found by analysis of short open reading frames from the full yeast genome , 1997, Yeast.

[15]  F Lopez,et al.  Reverse transcriptase-polymerase chain reaction validation of 25 "orphan" genes from Escherichia coli K-12 MG1655. , 2000, Genome research.

[16]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[17]  Jonathan A. Eisen,et al.  Microbial genome sequencing , 2000, Nature.

[18]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[19]  V. Thorsson,et al.  Genome sequence of Halobacterium species NRC-1. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  W. Doolittle,et al.  Microbial genomes: dealing with diversity. , 2001, Current opinion in microbiology.

[21]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[22]  B. Barrell,et al.  A Re-Annotation of the Saccharomyces Cerevisiae Genome , 2001, Comparative and functional genomics.

[23]  R. Doolittle A bug with excess gastric avidity , 1997, Nature.

[24]  D Fischer,et al.  Hybrid fold recognition: combining sequence derived properties with evolutionary information. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[25]  Arne Elofsson,et al.  Structure prediction meta server , 2001, Bioinform..

[26]  S. Andersson,et al.  Microbial genome evolution: sources of variability. , 2002, Current opinion in microbiology.

[27]  Daniel Fischer,et al.  Twenty thousand ORFan microbial protein families for the biologist? , 2003, Structure.

[28]  D Fischer,et al.  Rational structural genomics: affirmative action for ORFans and the growth in our structural knowledge. , 1999, Protein engineering.

[29]  David S. Eisenberg,et al.  Finding families for genomic ORFans , 1999, Bioinform..

[30]  C. Aquadro,et al.  The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. , 2001, Genetics.

[31]  B. Bloom On the particularity of pathogens , 2000, Nature.

[32]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[33]  David S. Eisenberg,et al.  Erratum. Finding families for genomic ORFans , 1999, Bioinform..

[34]  Daniel Fischer,et al.  Unravelling the ORFan Puzzle , 2003, Comparative and functional genomics.

[35]  F. Robb Archaea : a laboratory manual , 1995 .

[36]  D. Fischer,et al.  Analysis of singleton ORFans in fully sequenced microbial genomes , 2003, Proteins.

[37]  Anton J. Enright,et al.  Myriads of protein families, and still counting , 2003, Genome Biology.

[38]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.