MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins

MOTIVATION Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION Training data sets are also available at http://mitopred.sdsc.edu

[1]  Steven W. Taylor,et al.  Global organellar proteomics. , 2003, Trends in biotechnology.

[2]  Burkhard Rost,et al.  Inferring sub-cellular localization through automated lexical analysis , 2002, ISMB.

[3]  Fujiwara,et al.  Prediction of Mitochondrial Targeting Signals Using Hidden Markov Model. , 1997, Genome informatics. Workshop on Genome Informatics.

[4]  G. Cortopassi,et al.  A mitochondrial DNA clone is associated with increased risk for Alzheimer disease. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[6]  Hans-Peter Braun,et al.  Biochemical dissection of the mitochondrial proteome from Arabidopsis thaliana by three‐dimensional gel electrophoresis , 2002, Electrophoresis.

[7]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[8]  D. Eisenberg,et al.  Localizing proteins in the cell from their phylogenetic profiles. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Hanno Langen,et al.  The rat liver mitochondrial proteins , 2002, Electrophoresis.

[10]  Chun-Ting Zhang,et al.  A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. , 2002, The international journal of biochemistry & cell biology.

[11]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[12]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[13]  W. Neupert,et al.  Internal targeting signal of the BCS1 protein: a novel mechanism of import into mitochondria. , 1996, The EMBO journal.

[14]  R A Gottlieb Programmed cell death. , 2000, Drug news & perspectives.

[15]  G. Wooten,et al.  Maternal inheritance in Parkinson's disease , 1997, Annals of neurology.

[16]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[17]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[18]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[19]  Mary F. Lopez,et al.  High‐throughput profiling of the mitochondrial proteome using affinity fractionation and automation , 2000, Electrophoresis.

[20]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[21]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[22]  K. Chou,et al.  Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. , 2000, Biochemical and biophysical research communications.

[23]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.

[24]  K. Gempel,et al.  Mitochondria and Diabetes: Genetic, Biochemical, and Clinical Implications of the Cellular Energy Circuit , 1996, Diabetes.

[25]  W. Neupert,et al.  Protein transport into mitochondria. , 2000, Current opinion in microbiology.

[26]  Mohamed Rela,et al.  The role of mitochondria in ischemia/reperfusion injury , 2002, Transplantation.

[27]  C. Zhang,et al.  Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. , 2001, International journal of biological macromolecules.

[28]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[29]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[30]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[31]  Richard D Emes,et al.  Comparison of the genomes of human and mouse lays the foundation of genome zoology. , 2003, Human molecular genetics.

[32]  W. Neupert,et al.  The DNA Helicase, Hmi1p, Is Transported into Mitochondria by a C-terminal Cleavable Targeting Signal* , 1999, The Journal of Biological Chemistry.

[33]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[34]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[35]  M. Boutry,et al.  Protein transport into mitochondria is conserved between plant and yeast species. , 1990, The Journal of biological chemistry.

[36]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..