Functionality of system components: conservation of protein function in protein feature space.

Many protein features useful for prediction of protein function can be predicted from sequence, including posttranslational modifications, subcellular localization, and physical/chemical properties. We show here that such protein features are more conserved among orthologs than paralogs, indicating they are crucial for protein function and thus subject to selective pressure. This means that a function prediction method based on sequence-derived features may be able to discriminate between proteins with different function even when they have highly similar structure. Also, such a method is likely to perform well on organisms other than the one on which it was trained. We evaluate the performance of such a method, ProtFun, which relies on protein features as its sole input, and show that the method gives similar performance for most eukaryotes and performs much better than anticipated on archaea and bacteria. From this analysis, we conclude that for the posttranslational modifications studied, both the cellular use and the sequence motifs are conserved within Eukarya.

[1]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[2]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[3]  Y. Sakaki,et al.  Accumulation of Species-Specific Amino Acid Replacements That Cause Loss of Particular Protein Functions in Buchnera, an Endocellular Bacterial Symbiont , 2001, Journal of Molecular Evolution.

[4]  O. Lund,et al.  NetOglyc: Prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility , 1998, Glycoconjugate Journal.

[5]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[6]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[7]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[8]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[9]  M. Kanehisa,et al.  Prediction of protein function from sequence properties. Discriminant analysis of a data base. , 1984, Biochimica et biophysica acta.

[10]  Wen Huang,et al.  The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant , 2001, Nucleic Acids Res..

[11]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[12]  R. Jensen Orthologs and paralogs - we need to get it right , 2001, Genome Biology.

[13]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[14]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..

[15]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[16]  R. Spiro Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. , 2002, Glycobiology.

[17]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[18]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[19]  M. Hattori,et al.  Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS , 2000, Nature.

[20]  Chris Sander,et al.  EUCLID: automatic classification of proteins in functional classes by their database annotations , 1998, Bioinform..

[21]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[22]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[23]  S. Rogers,et al.  PEST sequences and regulation by proteolysis. , 1996, Trends in biochemical sciences.

[24]  S. Brunak,et al.  Orphan protein function and its relation to glycosylation. , 2002, Ernst Schering Research Foundation workshop.