In Silico screening for functional candidates amongst hypothetical proteins

BackgroundThe definition of a hypothetical protein is a protein that is predicted to be expressed from an open reading frame, but for which there is no experimental evidence of translation. Hypothetical proteins constitute a substantial fraction of proteomes of human as well as of other eukaryotes. With the general belief that the majority of hypothetical proteins are the product of pseudogenes, it is essential to have a tool with the ability of pinpointing the minority of hypothetical proteins with a high probability of being expressed.ResultsHere, we present an in silico selection strategy where eukaryotic hypothetical proteins are sorted according to two criteria that can be reliably identified in silico: the presence of subcellular targeting signals and presence of characterized protein domains. To validate the selection strategy we applied it on a database of human hypothetical proteins dating to 2006 and compared the proteins predicted to be expressed by our selecting strategy, with their status in 2008. For the comparison we focused on mitochondrial proteins, since considerable amounts of research have focused on this field in between 2006 and 2008. Therefore, many proteins, defined as hypothetical in 2006, have later been characterized as mitochondrial.ConclusionAmong the total amount of human proteins hypothetical in 2006, 21% have later been experimentally characterized and 6% of those have been shown to have a role in a mitochondrial context. In contrast, among the selected hypothetical proteins from the 2006 dataset, predicted by our strategy to have a mitochondrial role, 53-62% have later been experimentally characterized, and 85% of these have actually been assigned a role in mitochondria by 2008.Therefore our in silico selection strategy can be used to select the most promising candidates for subsequent in vitro and in vivo analyses.

[1]  Janusz M. Bujnicki,et al.  Trm11p and Trm112p Are both Required for the Formation of 2-Methylguanosine at Position 10 in Yeast tRNA , 2005, Molecular and Cellular Biology.

[2]  S. Merchant,et al.  How membrane proteins travel across the mitochondrial intermembrane space. , 1999, Trends in biochemical sciences.

[3]  M. Kiebler,et al.  Import of ADP/ATP carrier into mitochondria: two receptors act in parallel , 1990, The Journal of cell biology.

[4]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[5]  I. Greenwald,et al.  The Caenorhabditis elegans sel-1 gene, a negative regulator of lin-12 and glp-1, encodes a predicted extracellular protein. , 1996, Genetics.

[6]  A. Frankel,et al.  Yeast and Rat Coq3 and Escherichia coli UbiG Polypeptides Catalyze Both O-Methyltransferase Steps in Coenzyme Q Biosynthesis* , 1999, The Journal of Biological Chemistry.

[7]  Chittibabu Guda,et al.  TARGET: a new method for predicting protein subcellular localization in eukaryotes , 2005, Bioinform..

[8]  Zhiyong Lu,et al.  Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[9]  Peer Bork,et al.  SMART 5: domains in the context of genomes and networks , 2005, Nucleic Acids Res..

[10]  R. Martienssen,et al.  Molecular cloning and characterization of iojap (ij), a pattern striping gene of maize. , 1992, The EMBO journal.

[11]  Oliver Kohlbacher,et al.  MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition , 2006, Bioinform..

[12]  C. Clarke,et al.  Ubiquinone biosynthesis in Saccharomyces cerevisiae. Isolation and sequence of COQ3, the 3,4-dihydroxy-5-hexaprenylbenzoate methyltransferase gene. , 1991, The Journal of biological chemistry.

[13]  N. Pfanner,et al.  Essential role of Isd11 in mitochondrial iron–sulfur cluster synthesis on Isu scaffold proteins , 2006, The EMBO journal.

[14]  G. von Heijne,et al.  Domain structure of mitochondrial and chloroplast targeting peptides. , 1989, European journal of biochemistry.

[15]  Nicholas J. Hoogenraad,et al.  Molecular Chaperones Hsp90 and Hsp70 Deliver Preproteins to the Mitochondrial Import Receptor Tom70 , 2003, Cell.

[16]  Amos Bairoch,et al.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins , 2006, Nucleic Acids Res..

[17]  Piero Fariselli,et al.  BaCelLo: a balanced subcellular localization predictor , 2006, ISMB.

[18]  Prashanth Suravajhala Hypo, hype and ‘hyp’ human proteins , 2007, Bioinformation.

[19]  Eoin Fahy,et al.  MITOPRED: a web server for the prediction of mitochondrial proteins , 2004, Nucleic Acids Res..

[20]  G. Schatz,et al.  Mitochondrial presequences. , 1988, The Journal of biological chemistry.

[21]  S. Subramaniam,et al.  pTARGET [corrected] a new method for predicting protein subcellular localization in eukaryotes. , 2005, Bioinformatics.

[22]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[23]  Albert Sickmann,et al.  Multiple pathways for sorting mitochondrial precursor proteins , 2008, EMBO reports.

[24]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[25]  S. Carr,et al.  A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology , 2008, Cell.

[26]  B. Dobberstein,et al.  Common Principles of Protein Translocation Across Membranes , 1996, Science.

[27]  L. Stein 21.10 n&v 915 MH , 2004 .

[28]  N. Pfanner,et al.  The protein import machinery of mitochondria , 2007 .

[29]  Chittibabu Guda,et al.  pTARGET: a web server for predicting protein subcellular localization , 2006, Nucleic Acids Res..

[30]  Chittibabu Guda,et al.  Erratum: pTARGET: A new method for predicting protein subcellular localization in eukaryotes (Bioinformatics) vol. 21(21) (3963-3969)) , 2005 .

[31]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[32]  J. Schneider-Mergener,et al.  Distribution of Binding Sequences for the Mitochondrial Import Receptors Tom20, Tom22, and Tom70 in a Presequence-carrying Preprotein and a Non-cleavable Preprotein* , 1999, The Journal of Biological Chemistry.

[33]  N. Pfanner,et al.  Versatility of the mitochondrial protein import machinery , 2001, Nature Reviews Molecular Cell Biology.

[34]  H. Prokisch,et al.  The Nfs1 interacting protein Isd11 has an essential role in Fe/S cluster biogenesis in mitochondria , 2006, The EMBO journal.

[35]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  D. Kohda,et al.  Structural Basis of Presequence Recognition by the Mitochondrial Protein Import Receptor Tom20 , 2000, Cell.

[37]  C. Clarke,et al.  Isolation and Functional Expression of Human COQ3, a Gene Encoding a Methyltransferase Required for Ubiquinone Biosynthesis* , 2000, The Journal of Biological Chemistry.