The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility (P. aureginosa only). We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that, while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. We finally report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bioontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Tapio Salakoski | Feng Zhang | Paolo Fontana | Silvio C. E. Tosatto | Daisuke Kihara | Alfredo Benso | Burkhard Rost | Sabeur Aridhi | Angela D. Wilkins | Olivier Lichtarge | Liisa Holm | Nevena Veljkovic | Stefano Di Carlo | Slobodan Vucetic | Michal Linial | Predrag Radivojac | Claire O'Donovan | Casey S. Greene | Alexandra J. Lee | Giuliano Grossi | Haixuan Yang | Alex A. Freitas | Chenguang Zhao | Maxat Kulmanov | Robert Hoehndorf | Alfonso Romero | Yang Zhang | Patricia C. Babbitt | Sean D. Mooney | Renzhi Cao | Jianlin Cheng | Ian Sillitoe | Christine A. Orengo | Jonathan G. Lees | Richard Bonneau | Weidong Tian | Alice C. McHardy | Gianfranco Michele Maria Politano | Larry Davis | Peter L. Freddolino | Steven E. Brenner | Shanfeng Zhu | Hans Moen | Filip Ginter | Daniel B. Roche | Castrense Savojardo | Pier Luigi Martelli | Rita Casadio | Petri Törönen | Marco Mesiti | Ronghui You | Alex Warwick Vesztrocy | Christophe Dessimoz | Marie-Dominique Devignes | Kai Hakala | Marco Falda | Wen-Hung Liao | Elaine Zosa | Peter W. Rose | Volkan Atalay | Alessandro Petrini | Florian Boecker | Sayoni Das | Ehsaneddin Asgari | Alberto Paccanaro | Tomislav Šmuc | Po-Han Chi | Michael L. Tress | Aashish Jain | Jie Hou | Jose Manuel Rodriguez | Rabie Saidi | Matteo Re | Giuseppe Profiti | David Ritchie | Marco Frasca | Marco Notaro | Tunca Doğan | Zheng Wang | Indika Kahanda | Fran Supek | Fabio Fabris | Giorgio Valentini | Natalie Thurlby | Da Chen Emily Koo | Adrian M. Altenhoff | Liam J. McGuffin | Marco Carraro | Seyed Ziaeddin Alborzi | Michele Berselli | Enrico Lavezzo | Ahmet Sureyya Rifaioglu | Neven Sumonja | Julian Gough | Suyang Dai | Tatyana Goldberg | Mark N. Wass | David T. Jones | Iddo Friedberg | Deborah A. Hogan | Constance J. Jeffery | Naihui Zhou | Jonas Reeb | Imane Boudellioua | Jia-Ming Chang | Chengxin Zhang | Hai Fang | Rengul Cetin-Atalay | Devon Johnson | Mateo Torres | Erica Suh | Saso Dzeroski | Jeffrey M Yunes | Alan Medlar | Qizhong Mao | Alexandra J Lee | Branislava Gemovic | Radoslav Davidovic | Hafeez Ur Rehman | Meet Barot | Yuxiang Jiang | Timothy Bergquist | Balint Z. Kacsoh | Alex W. Crocker | Kimberley A. Lewis | George E. Georghiou | Huy N Nguyen | Nafiz Hamid | Alperen Dalkiran | Rebecca L Hurto | Prajwal Bhat | José M. Fernández | Vladimir Perovic | Mohammad Rk Mofrad | Alexandre Renaux | Magdalena Antczak | Itamar Borukhov | Ilya B. Novikov | Wei-Cheng Tseng | Vedrana Vidulin | Cen Wan | Domenico Cozzetto | Rui Fa | Alex Wiarwick Vesztrocy | Vladimir Gligorijević | Stefano Toppo | Damiano Piovesan | Shanshan Zhang | Gage S Black | Dane Jo | Dallas J. Larsen | Ashton Omdahl | Luke W Sagers | Jonathan B Dayton | Danielle A Brackenridge | Zihan Zhang | Shuwei Yao | Caleb Chandler | Miguel Amezola | Yi-Wei Liu | stefano pascarelli | Yotam Frank | Farrokh Mehryary | Suwisa Kaewphan | Jari Björne | Martti E.E. Tolvanen | Asa Ben-Hur | Giovanni Bosco | María Martín | Richard Bonneau | B. Rost | L. Holm | C. Orengo | M. Tress | J. Rodriguez | S. Džeroski | Alberto Paccanaro | R. Cetin-Atalay | J. Gough | S. Brenner | D. Kihara | D. Cozzetto | Yang Zhang | V. Atalay | P. Radivojac | A. Ben-Hur | Jeffrey M. Yunes | Damiano Piovesan | R. Casadio | Jianlin Cheng | Hai Fang | P. Törönen | O. Lichtarge | Haixuan Yang | Alfonso E. Romero | Prajwal Bhat | Jari Björne | T. Salakoski | M. Wass | F. Supek | T. Šmuc | M. Falda | P. Fontana | E. Lavezzo | S. Toppo | S. Vucetic | M. Linial | P. Babbitt | S. Mooney | I. Friedberg | Zheng Wang | Filip Ginter | G. Georghiou | M. Martin | C. O’Donovan | L. McGuffin | P. W. Rose | G. Valentini | M. Mesiti | Ehsaneddin Asgari | M. Mofrad | K. Hakala | Vedrana Vidulin | D. Ritchie | J. Lees | I. Sillitoe | A. Freitas | A. Mchardy | C. Greene | P. Martelli | R. Hoehndorf | Jonas Reeb | M. Ré | Larry Davis | Po-Han Chi | Yotam Frank | D. Hogan | V. Gligorijević | C. Dessimoz | M. Devignes | S. Tosatto | C. Jeffery | Giuseppe Profiti | Castrense Savojardo | Imane Boudellioua | Maxat Kulmanov | A. Wilkins | Renzhi Cao | Jie Hou | Tatyana Goldberg | N. Veljkovic | Jonathan B. Dayton | A. Benso | S. Carlo | G. Politano | Hans Moen | T. Bergquist | Shanfeng Zhu | J. M. Fernández | A. Medlar | Jia-Ming Chang | G. Bosco | V. Perovic | F. Fabris | N. Hamid | Sabeur Aridhi | H. Rehman | I. Borukhov | A. Altenhoff | A. Renaux | Branislava Gemović | G. Grossi | M. Frasca | M. Notaro | W. Tseng | Chenguang Zhao | A. Petrini | Rui Fa | Sayoni Das | Aashish Jain | Rabie Saidi | A. Omdahl | Indika Kahanda | Ilya B. Novikov | Chengxin Zhang | M. Antczak | Naihui Zhou | Yuxiang Jiang | B. Kacsoh | A. Rifaioglu | Alperen Dalkiran | R. L. Hurto | Neven Sumonja | F. Boecker | Natalie Thurlby | Elaine Zosa | Mateo Torres | Meet Barot | Michele Berselli | Marco Carraro | Qizhong Mao | Shanshan Zhang | Dane Jo | Erica Suh | D. Larsen | D. A. Brackenridge | R. You | Suyang Dai | Shuwei Yao | Weidong Tian | Miguel Amezola | Devon Johnson | Wen-Hung Liao | Yi-Wei Liu | Stefano Pascarelli | Farrokh Mehryary | S. Kaewphan | M. Tolvanen | Huy N. Nguyen | Gage S. Black | Ashton R. Omdahl | Caleb Chandler | Feng Zhang | Tunca Dogan | Cen Wan | Luke Sagers | R. Davidovic | Alex Crocker | Zihan Zhang | L. Sagers | Ashton Omdahl | J. Rodríguez | M. R. Mofrad | P. Freddolino | Suwisa Kaewphan | Florian Boecker | Vladimir Perovic | Timothy Bergquist | Iddo Friedberg | Asa Ben-Hur | D. J. Larsen

[1]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[2]  Predrag Radivojac,et al.  New Drosophila Long-Term Memory Genes Revealed by Assessing Computational Function Prediction Methods , 2018, G3: Genes, Genomes, Genetics.

[3]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[4]  Amarda Shehu,et al.  A Survey of Computational Methods for Protein Function Prediction , 2016 .

[5]  Christophe Dessimoz,et al.  CAFA and the open world of protein function predictions. , 2013, Trends in genetics : TIG.

[6]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Kai Li,et al.  Directing Experimental Biology: A Case Study in Mitochondrial Biogenesis , 2009, PLoS Comput. Biol..

[8]  Rui Fa,et al.  Predicting human protein function with multi-task deep neural networks , 2018, bioRxiv.

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  Raymond Lo,et al.  Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database , 2015, Nucleic Acids Res..

[11]  Fengzhu Sun,et al.  NetGO: improving large-scale protein function prediction with massive network information , 2019, Nucleic acids research.

[12]  Predrag Radivojac,et al.  Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies , 2017, Bioinform..

[13]  Samuel A. Lee,et al.  Candida albicans VPS1 contributes to protease secretion, filamentation, and biofilm formation. , 2008, Fungal genetics and biology : FG & B.

[14]  C. Orengo,et al.  Protein function prediction--the power of multiplicity. , 2009, Trends in biotechnology.

[15]  Frederick M Ausubel,et al.  Correction for Liberati et al., An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants , 2006, Proceedings of the National Academy of Sciences.

[16]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[17]  Asa Ben-Hur,et al.  Hierarchical Classification of Gene Ontology Terms Using the Gostruct Method , 2010, J. Bioinform. Comput. Biol..

[18]  Marek S. Skrzypek,et al.  The Candida Genome Database (CGD): incorporation of Assembly 22, systematic identifiers and visualization of high throughput sequencing data , 2016, Nucleic Acids Res..

[19]  T M Murali,et al.  Large-scale protein function prediction using heterogeneous ensembles , 2018, F1000Research.

[20]  D. Soll,et al.  Target specificity of the Candida albicans Efg1 regulator , 2011, Molecular microbiology.

[21]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[22]  David T Jones,et al.  Computational Methods for Annotation Transfers from Sequence. , 2016, Methods in molecular biology.

[23]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[24]  Oliver Kurzai,et al.  The Candida albicans-Specific Gene EED1 Encodes a Key Regulator of Hyphal Extension , 2011, PloS one.

[25]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[26]  Michael J. E. Sternberg,et al.  ConFunc - functional annotation in the twilight zone , 2008, Bioinform..

[27]  Guanghua Huang,et al.  Self-Induction of a/a or α/α Biofilms in Candida albicans Is a Pheromone-Based Paracrine System Requiring Switching , 2011, Eukaryotic Cell.

[28]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[29]  R. Kolter,et al.  Two Genetic Loci Produce Distinct Carbohydrate-Rich Structural Components of the Pseudomonas aeruginosa Biofilm Matrix , 2004, Journal of bacteriology.

[30]  E. Banin,et al.  The Effect of pstS and phoB on Quorum Sensing and Swarming Motility in Pseudomonas aeruginosa , 2013, PloS one.

[31]  The UniProt Consortium UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Res..

[32]  Victoria Chen,et al.  Systematic screens of a Candida albicans homozygous deletion library decouple morphogenetic switching and pathogenicity , 2010, Nature Genetics.

[33]  M. Parsek,et al.  Identification of psl, a Locus Encoding a Potential Exopolysaccharide That Is Essential for Pseudomonas aeruginosa PAO1 Biofilm Formation , 2004, Journal of bacteriology.

[34]  Roberto Kolter,et al.  Genes involved in matrix formation in Pseudomonas aeruginosa PA14 biofilms , 2003, Molecular microbiology.

[35]  Predrag Radivojac,et al.  Information-theoretic evaluation of predicted ontological annotations , 2013, Bioinform..

[36]  A. Mitchell,et al.  Candida albicans Biofilm-Defective Mutants , 2005, Eukaryotic Cell.

[37]  Frederick M. Ausubel,et al.  BifA, a Cyclic-Di-GMP Phosphodiesterase, Inversely Regulates Biofilm Formation and Swarming Motility by Pseudomonas aeruginosa PA14 , 2007, Journal of bacteriology.

[38]  Damiano Piovesan,et al.  INGA 2.0: improving protein function prediction for the dark proteome , 2019, Nucleic Acids Res..

[39]  Alberto González-Novo,et al.  CDK-dependent phosphorylation of Mob2 is essential for hyphal development in Candida albicans , 2011, Molecular biology of the cell.

[40]  C. A. Andersen,et al.  Prediction of human protein function from post-translational modifications and localization features. , 2002, Journal of molecular biology.

[41]  H. Bussey,et al.  Large‐scale essential gene identification in Candida albicans and applications to antifungal drug discovery , 2003, Molecular microbiology.

[42]  G. Fink,et al.  Suppression of hyphal formation in Candida albicans by mutation of a STE12 homolog. , 1994, Science.

[43]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[44]  Jie Tan,et al.  Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks. , 2017, Cell systems.

[45]  M. Prevost,et al.  Correction for The Yak1 Kinase Is Involved in the Initiation and Maintenance of Hyphal Growth in Candida albicans , 2008, Molecular biology of the cell.

[46]  Mona Singh,et al.  Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps , 2005, ISMB.

[47]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[48]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[49]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[50]  Predrag Radivojac,et al.  The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective , 2014, Bioinform..

[51]  Olga G. Troyanskaya,et al.  Computationally Driven, Quantitative Experiments Discover Genes Required for Mitochondrial Biogenesis , 2009, PLoS genetics.

[52]  Silvio C. E. Tosatto,et al.  INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity , 2015, Nucleic Acids Res..

[53]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[54]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[56]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[57]  P. Radivojac,et al.  Analysis of protein function and its prediction from amino acid sequence , 2011, Proteins.

[58]  Predrag Radivojac,et al.  Community-Wide Evaluation of Computational Function Prediction. , 2016, Methods in molecular biology.

[59]  Prudence Mutowo-Meullenet,et al.  The GOA database: Gene Ontology annotation updates for 2015 , 2014, Nucleic Acids Res..

[60]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[61]  Yi Xiong,et al.  GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank , 2017, bioRxiv.