A look back at the quality of Protein Function Prediction tools in CAFA

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

[1]  Tsong Yueh Chen,et al.  Metamorphic Testing: A New Approach for Generating Next Test Cases , 2020, ArXiv.

[2]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[3]  Upulee Kanewala,et al.  Experiences of Testing Bioinformatics Programs for Detecting Subtle Faults , 2016, 2016 IEEE/ACM International Workshop on Software Engineering for Science (SE4Science).

[4]  Renzhi Cao,et al.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. , 2016, Methods.

[5]  Saso Dzeroski,et al.  Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships , 2013, PLoS Comput. Biol..

[6]  Liisa Holm,et al.  PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment , 2015, Bioinform..

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  Daisuke Kihara,et al.  The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches , 2015, GigaScience.

[9]  A. Ramanathan,et al.  Verification of Compartmental Epidemiological Models Using Metamorphic Testing, Model Checking and Visual Analytics , 2012, 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom).

[10]  Gaston H. Gonnet,et al.  The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements , 2014, Nucleic Acids Res..

[11]  Paolo Fontana,et al.  Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms , 2012, BMC Bioinformatics.

[12]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[13]  L. L. Pullum,et al.  Early Results from Metamorphic Testing of Epidemiological Models , 2012, 2012 ASE/IEEE International Conference on BioMedical Computing (BioMedCom).

[14]  Tapio Salakoski,et al.  Exploring Biomolecular Literature with EVEX: Connecting Genes through Events, Homology, and Indirect Associations , 2012, Adv. Bioinformatics.

[15]  Eleni Giannoulatou,et al.  Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie , 2014, BMC Bioinformatics.

[16]  Silvio C. E. Tosatto,et al.  INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity , 2015, Nucleic Acids Res..

[17]  Huai Liu,et al.  An innovative approach for testing bioinformatics programs using metamorphic testing , 2009, BMC Bioinformatics.

[18]  Madhusudan Srinivasan,et al.  Quality Assurance of Bioinformatics Software: A Case Study of Testing a Biomedical Text Processing Tool Using Metamorphic Testing , 2018, 2018 IEEE/ACM 3rd International Workshop on Metamorphic Testing (MET).