A Weighted Power Framework for Integrating Multisource Information: Gene Function Prediction in Yeast

Predicting the functions of unannotated genes is one of the major challenges of biological investigation. In this study, we propose a weighted power scoring framework, called weighted power biological score (WPBS), for combining different biological data sources and predicting the function of some of the unclassified yeast Saccharomyces cerevisiae genes. The relative power and weight coefficients of different data sources, in the proposed score, are estimated systematically by utilizing functional annotations [yeast Gene Ontology (GO)-Slim: Process] of classified genes, available from Saccharomyces Genome Database. Genes are then clustered by applying k-medoids algorithm on WPBS, and functional categories of 334 unclassified genes are predicted using a P-value cutoff 1 × 10-5. The WPBS is available online at http://www.isical.ac.in/~shubhra/WPBS/WPBS.html, where one can download WPBS, related files, and a MATLAB code to predict functions of unclassified genes.

[1]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  E. Marcotte,et al.  Bioinformatic Prediction of Yeast Gene Function , 2006 .

[3]  Fumio Hanaoka,et al.  Systematic identification, classification, and characterization of the open reading frames which encode novel helicase‐related proteins in Saccharomyces cerevisiae by gene disruption and Northern analysis , 1999, Yeast.

[4]  Peter B. McGarvey,et al.  The Protein Information Resource (PIR) , 2000, Nucleic Acids Res..

[5]  Sanghamitra Bandyopadhyay,et al.  Combining Multisource Information Through Functional-Annotation-Based Weighting: Gene Function Prediction in Yeast , 2009, IEEE Transactions on Biomedical Engineering.

[6]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[7]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[8]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[9]  Qicheng Ma,et al.  Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks , 2005, BMC Bioinformatics.

[10]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[11]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  G. Sumara,et al.  A Probabilistic Functional Network of Yeast Genes , 2004 .

[14]  In suk Lee,et al.  24 Bioinformatic Prediction of Yeast Gene Function , 2007 .

[15]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[18]  Gavin Sherlock,et al.  Global analysis of gene function in yeast by quantitative phenotypic profiling , 2006, Molecular systems biology.

[19]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[20]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.