Systems Biology via Redescription and Ontologies (III): Protein Classification Using Malaria Parasite's Temporal Transcriptomic Profiles

This paper addresses the protein classification problem, andexplores how its accuracy can be improved by using information fromtime-course gene expression data. The methods are tested on datafrom the most deadly species of the parasite responsible for malariainfections, Plasmodium falciparum. Even though avaccination for Malaria infections has been under intense study formany years, more than half of Plasmodiumproteins still remain uncharacterized and therefore are exemptedfrom clinical trials. The task is further complicated by arapid life cycle of the parasite, thus making precisetargeting of the appropriate proteins for vaccination a technicalchallenge. We propose to integrate protein-protein interactions (PPIs),sequence similarity, metabolic pathway, andgene expression, to produce a suitable set of predicted proteinfunctions for P.falciparum. Further,we treat gene expression data withrespect to various changes that occur during the five phases of theintraerythrocytic developmental cycle (IDC) (as determinedby our segmentation algorithm) ofP.falciparum and show that this analysis yields asignificantly improved protein function prediction, e.g., whencompared to analysis based on Pearson correlation coefficients seenin the data. The algorithm is able to assign ``meaningful''functions to 628 out of 1439 previously unannotated proteins, whichare first-choice candidates for experimental vaccine research.

[1]  Ting Chen,et al.  Mapping gene ontology to proteins based on protein-protein interaction data , 2004, Bioinform..

[2]  Bud Mishra,et al.  Systems biology via redescription and ontologies (I): finding phase changes with applications to malaria temporal data , 2007, Systems and Synthetic Biology.

[3]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[4]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[5]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[7]  Vladimir Pavlovic,et al.  Integrative Protein Function Transfer Using Factor Graphs and Heterogeneous Data Sources , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[8]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[9]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[12]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[13]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[14]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Vladimir Pavlovic,et al.  Protein classification using probabilistic chain graphs and the Gene Ontology structure , 2006, Bioinform..

[16]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[17]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[18]  Rolf Apweiler,et al.  The Proteome Analysis database: a tool for the in silico analysis of whole proteomes , 2003, Nucleic Acids Res..