Combining gene expression profiles and protein-protein interaction data to infer gene functions.

The ever-increasing flow of gene expression profiles and protein-protein interactions has catalyzed many computational approaches for inference of gene functions. Despite all the efforts, there is still room for improvement, for the information enriched in each biological data source has not been exploited to its fullness. A composite method is proposed for classifying unannotated genes based on expression data and protein-protein interaction (PPI) data, which extracts information from both data sources in novel ways. With the noise nature of expression data taken into consideration, importance is attached to the consensus expression patterns of gene classes instead of the actual expression profiles of individual genes, thus characterizing the composite method with enhanced robustness against microarray data variation. With regard to the PPI network, the traditional clear-cut binary attitude towards inter- and intra-functional interactions is abandoned, whereas a more objective perspective into the PPI network structure is formed through incorporating the varied function-function interaction probabilities into the algorithm. The composite method was implemented in two numerical experiments, where its improvement over single-data-source based methods was observed and the superiority of the novel data handling operations was discussed.

[1]  Jan Komorowski,et al.  Predicting Gene Function from Gene Expressions and Ontologies , 2000, Pacific Symposium on Biocomputing.

[2]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[4]  S. Fields,et al.  The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[6]  W. Fung,et al.  Detecting differentially expressed genes by relative entropy. , 2005, Journal of theoretical biology.

[7]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  T. Takagi,et al.  Assessment of prediction accuracy of protein function from protein–protein interaction data , 2001, Yeast.

[11]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[12]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[13]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[14]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[15]  T. Joshi,et al.  Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae. , 2004, Omics : a journal of integrative biology.

[16]  Dong Xu,et al.  Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. , 2004, Nucleic acids research.

[17]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[18]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[19]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[22]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[23]  G. Ramsay DNA chips: State-of-the art , 1998, Nature Biotechnology.

[24]  Albrecht Rau,et al.  Learning multi-class classification problems , 1992 .

[25]  George Karypis,et al.  Gene Classification Using Expression Profiles: A Feasibility Study , 2005, Int. J. Artif. Intell. Tools.