INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity

Identifying protein functions can be useful for numerous applications in biology. The prediction of gene ontology (GO) functional terms from sequence remains however a challenging task, as shown by the recent CAFA experiments. Here we present INGA, a web server developed to predict protein function from a combination of three orthogonal approaches. Sequence similarity and domain architecture searches are combined with protein-protein interaction network data to derive consensus predictions for GO terms using functional enrichment. The INGA server can be queried both programmatically through RESTful services and through a web interface designed for usability. The latter provides output supporting the GO term predictions with the annotating sequences. INGA is validated on the CAFA-1 data set and was recently shown to perform consistently well in the CAFA-2 blind test. The INGA web server is available from URL: http://protein.bio.unipd.it/inga.

[1]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[2]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[3]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[4]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[5]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[6]  Andrew D. Moore,et al.  Arrangements in the modular evolution of proteins. , 2008, Trends in biochemical sciences.

[7]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[8]  Di Wu,et al.  Bioinformatics analysis of the epitope regions for norovirus capsid protein , 2013, BMC Bioinformatics.

[9]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[10]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[11]  Emanuela Leonardi,et al.  Identification of Four Novel PCDH19 Mutations and Prediction of Their Functional Impact , 2014, Annals of human genetics.

[12]  Piero Fariselli,et al.  BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences , 2011, Nucleic Acids Res..

[13]  Limsoon Wong,et al.  Using indirect protein interactions for the prediction of Gene Ontology functions , 2007, BMC Bioinformatics.

[14]  P. Fontana,et al.  Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology , 2009, PloS one.

[15]  A. Nose,et al.  Neural cadherin: role in selective cell-cell adhesion. , 1989, Science.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Christine A. Orengo,et al.  Protein function prediction using domain families , 2013, BMC Bioinformatics.

[18]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[19]  Benoit H. Dessailly,et al.  Exploiting structural classifications for function prediction: towards a domain grammar for protein function. , 2009, Current opinion in structural biology.

[20]  Piero Fariselli,et al.  How to inherit statistically validated annotation within BAR+ protein clusters , 2013, BMC Bioinformatics.

[21]  James D. Jontes,et al.  Protocadherin-19 and N-cadherin interact to control cell movements during anterior neurulation , 2010, The Journal of cell biology.

[22]  Cyrus Chothia,et al.  SUPERFAMILY 1.75 including a domain-centric gene ontology method , 2010, Nucleic Acids Res..

[23]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[24]  Daisuke Kihara,et al.  ESG: extended similarity group method for automated protein function prediction , 2008, Bioinform..

[25]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[26]  Hiroyuki Moriguchi,et al.  Protocadherin-17 mediates collective axon extension by recruiting actin regulator complexes to interaxonal contacts. , 2014, Developmental cell.

[27]  Daniel W. A. Buchan,et al.  Protein function prediction by massive integration of evolutionary analyses and multiple data sources , 2013, BMC Bioinformatics.

[28]  Michael I. Jordan,et al.  Genome-scale phylogenetic function annotation of large and diverse protein families. , 2011, Genome research.

[29]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[30]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[31]  Andrew Menzies,et al.  X-linked protocadherin 19 mutations cause female-limited epilepsy and cognitive impairment , 2008, Nature Genetics.

[32]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.