Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions

The Automated Functional Prediction (AFP) of proteins became a challenging problem in bioinformatics and biomedicine aiming at handling and interpreting the extremely large-sized proteomes of several eukaryotic organisms. A central issue in AFP is the absence in public repositories for protein functions, e.g. the Gene Ontology (GO), of well defined sets of negative examples to learn accurate classifiers for AFP. In this paper we investigate the Query by Committee paradigm of active learning to select the negatives most informative for the classifier and the protein function to be inferred. We validated our approach in predicting the Gene Ontology function for the S.cerevisiae proteins.

[1]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[2]  Quaid Morris,et al.  Using the Gene Ontology Hierarchy when Predicting Gene Function , 2009, UAI.

[3]  Giorgio Valentini,et al.  UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions , 2015, J. Comput. Biol..

[4]  Dario Malchiodi,et al.  Analysis of Informative Features for Negative Selection in Protein Function Prediction , 2017, IWBBIO.

[5]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Dario Malchiodi,et al.  Evaluating the impact of topological protein features on the negative examples selection , 2018, BMC Bioinformatics.

[9]  Giorgio Valentini,et al.  COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs , 2011, ECML/PKDD.

[10]  Naoki Abe,et al.  Query Learning Strategies Using Boosting and Bagging , 1998, ICML.

[11]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[12]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[13]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[14]  Duane Szafron,et al.  Improving Protein Function Prediction using the Hierarchical Structure of the Gene Ontology , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[15]  Slobodan Vucetic,et al.  MS-kNN: protein function prediction by integrating multiple data sources , 2013, BMC Bioinformatics.

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  Dennis Shasha,et al.  Parametric Bayesian priors and better choice of negative examples improve protein function prediction , 2013, Bioinform..

[18]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .

[19]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[20]  Giulio Pavesi,et al.  A neural network based algorithm for gene expression prediction from chromatin structure , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[21]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[22]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[23]  Giorgio Valentini,et al.  A Fast Ranking Algorithm for Predicting Gene Functions in Biomolecular Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Alicia Karspeck,et al.  Comparison of Filtering Methods for the Modeling and Retrospective Forecasting of Influenza Epidemics , 2014, PLoS Comput. Biol..

[25]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[26]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[27]  Dennis Shasha,et al.  Negative Example Selection for Protein Function Prediction: The NoGO Database , 2014, PLoS Comput. Biol..