CombFunc: predicting protein function using heterogeneous data sources

Only a small fraction of known proteins have been functionally characterized, making protein function prediction essential to propose annotations for uncharacterized proteins. In recent years many function prediction methods have been developed using various sources of biological data from protein sequence and structure to gene expression data. Here we present the CombFunc web server, which makes Gene Ontology (GO)-based protein function predictions. CombFunc incorporates ConFunc, our existing function prediction method, with other approaches for function prediction that use protein sequence, gene expression and protein–protein interaction data. In benchmarking on a set of 1686 proteins CombFunc obtains precision and recall of 0.71 and 0.64 respectively for gene ontology molecular function terms. For biological process GO terms precision of 0.74 and recall of 0.41 is obtained. CombFunc is available at http://www.sbg.bio.ic.ac.uk/combfunc.

[1]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[2]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[3]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Alessandro Vespignani,et al.  Global protein function prediction from protein-protein interaction networks , 2003, Nature Biotechnology.

[5]  Simon Kasif,et al.  Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data , 2007, PloS one.

[6]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[7]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[8]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[10]  O. Troyanskaya,et al.  Predicting gene function in a hierarchical context with an ensemble of classifiers , 2008, Genome Biology.

[11]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[12]  Christine A. Orengo,et al.  Inferring Function Using Patterns of Native Disorder in Proteins , 2007, PLoS Comput. Biol..

[13]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[14]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[15]  A. Owen,et al.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae) , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Daisuke Kihara,et al.  ESG: extended similarity group method for automated protein function prediction , 2008, Bioinform..

[17]  Jihong Guan,et al.  PredUs: a web server for predicting protein interfaces using structural neighbors , 2011, Nucleic Acids Res..

[18]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[19]  Michael J. E. Sternberg,et al.  Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe , 2010, Bioinform..

[20]  Michael J E Sternberg,et al.  Prediction of ligand binding sites using homologous structures and conservation at CASP8 , 2009, Proteins.

[21]  Michael J. E. Sternberg,et al.  3DLigandSite: predicting ligand-binding sites using similar structures , 2010, Nucleic Acids Res..

[22]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[23]  A. Valencia,et al.  Practical limits of function prediction , 2000, Proteins.

[24]  Michael I. Jordan,et al.  Genome-scale phylogenetic function annotation of large and diverse protein families. , 2011, Genome research.

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  M. Sternberg,et al.  Proteomic analysis of Plasmodium in the mosquito: progress and pitfalls , 2012, Parasitology.

[28]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[29]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[30]  M. Sternberg,et al.  Protein structure prediction on the Web: a case study using the Phyre server , 2009, Nature Protocols.

[31]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[32]  Andreas Martin Lisewski,et al.  Protein function prediction: towards integration of similarity metrics. , 2011, Current opinion in structural biology.

[33]  Michael I. Jordan,et al.  Consistent probabilistic outputs for protein function prediction , 2008, Genome Biology.

[34]  P. Radivojac,et al.  Analysis of protein function and its prediction from amino acid sequence , 2011, Proteins.

[35]  María Martín,et al.  Ongoing and future developments at the Universal Protein Resource , 2010, Nucleic Acids Res..

[36]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[37]  M. Helmer-Citterich,et al.  Structure-based function prediction: approaches and applications. , 2008, Briefings in functional genomics & proteomics.

[38]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[39]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[40]  Kengo Kinoshita,et al.  COXPRESdb: a database to compare gene coexpression in seven model animals , 2010, Nucleic Acids Res..

[41]  Kengo Kinoshita,et al.  COXPRESdb: a database of coexpressed gene networks in mammals , 2007, Nucleic Acids Res..

[42]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[43]  Erik L. L. Sonnhammer,et al.  Predicting protein function from domain content , 2008, Bioinform..