Localizing proteins in the cell from their phylogenetic profiles.

We introduce a computational method for identifying subcellular locations of proteins from the phylogenetic distribution of the homologs of organellar proteins. This method is based on the observation that proteins localized to a given organelle by experiments tend to share a characteristic phylogenetic distribution of their homologs-a phylogenetic profile. Therefore any other protein can be localized by its phylogenetic profile. Application of this method to mitochondrial proteins reveals that nucleus-encoded proteins previously known to be destined for mitochondria fall into three groups: prokaryote-derived, eukaryote-derived, and organism-specific (i.e., found only in the organism under study). Prokaryote-derived mitochondrial proteins can be identified effectively by their phylogenetic profiles. In the yeast Saccharomyces cerevisiae, 361 nucleus-encoded mitochondrial proteins can be identified at 50% accuracy with 58% coverage. From these values and the proportion of conserved mitochondrial genes, it can be inferred that approximately 630 genes, or 10% of the nuclear genome, is devoted to mitochondrial function. In the worm Caenorhabditis elegans, we estimate that there are approximately 660 nucleus-encoded mitochondrial genes, or 4% of its genome, with approximately 400 of these genes contributed from the prokaryotic mitochondrial ancestor. The large fraction of organism-specific and eukaryote-derived genes suggests that mitochondria perform specialized roles absent from prokaryotic mitochondrial ancestors. We observe measurably distinct phylogenetic profiles among proteins from different subcellular compartments, allowing the general use of prokaryotic genomes in learning features of eukaryotic proteins.

[1]  Lynn Margulis,et al.  The Colonization Hypothesis. (Book Reviews: Origin of Eukaryotic Cells. Evidence and Research Implications for a Theory of the Origin and Evolution of Microbial, Plant, and Animal Cells on the Precambrian Earth) , 1970 .

[2]  P. Raven,et al.  ORIGIN OF EUKARYOTIC CELLS , 1971 .

[3]  C DeLisi,et al.  The detection and classification of membrane-spanning proteins. , 1985, Biochimica et biophysica acta.

[4]  Betsey Dexter Dyer,et al.  The origin of eukaryotic cells , 1985 .

[5]  James A. Lake,et al.  Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences , 1988, Nature.

[6]  D. Haldar,et al.  Two-dimensional gel electrophoretic resolution of the polypeptides of rat liver mitochondria and the outer membrane. , 1990, Biochimica et biophysica acta.

[7]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[8]  Christian de Duve,et al.  The Birth of Complex Cells , 1996 .

[9]  D. Valle,et al.  A Saccharomyces cerevisiae homolog of the human adrenoleukodystrophy transporter is a heterodimer of two half ATP-binding cassette transporters. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Hochstrasser,et al.  The yeast SWISS‐2DPAGE database , 1996, Electrophoresis.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[13]  T Gaasterland,et al.  Microbial genescapes: a prokaryotic view of the yeast genome. , 1998, Microbial & comparative genomics.

[14]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[15]  P. Courchesne,et al.  Two‐dimensional electrophoresis of human placental mitochondria and protein identification by mass spectrometry: Toward a human mitochondrial proteome , 1998, Electrophoresis.

[16]  C. Hoogland,et al.  '98 Escherichia coli SWISS‐2DPAGE database update , 1998, Electrophoresis.

[17]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[18]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[19]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[20]  A. Hinnen,et al.  Functional analysis of 150 deletion mutants in Saccharomyces cerevisiae by a systematic approach , 1999, Molecular and General Genetics MGG.

[21]  M. W. Gray,et al.  Evolution of organellar genomes. , 1999, Current opinion in genetics & development.

[22]  B. Lang,et al.  Mitochondrial evolution. , 1999, Science.

[23]  J. Hiltunen,et al.  Yeast Peroxisomal Multifunctional Enzyme: (3R)-Hydroxyacyl-CoA Dehydrogenase Domains A and B Are Required for Optimal Growth on Oleic Acid* , 1999, The Journal of Biological Chemistry.

[24]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[25]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[26]  Michael E. Cusick,et al.  The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information , 2000, Nucleic Acids Res..

[27]  W. Neupert,et al.  Protein transport into mitochondria. , 2000, Current opinion in microbiology.

[28]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..