Analysis of Informative Features for Negative Selection in Protein Function Prediction

Negative examples in automated protein function prediction (AFP), that is proteins known not to possess a given protein function, are usually not directly stored in public proteome and genome databases, such as the Gene Ontology database. Nevertheless, most computational methods need negative examples to infer new predictions. A variety of algorithms has been proposed in AFP for negative selection, ranging from network- and feature-based heuristics, to hierarchy-based and hierarchy-less strategies. Moreover, several bio-molecular information sources about proteins, such as gene co-expression, genetic and protein-protein interactions data, are naturally encoded in protein networks, where nodes are proteins and edges connect proteins sharing common characteristics. Although selecting negatives in biological networks is thereby a central and challenging problem in computational biology, detecting the characteristics proteins should have to be considered as negative is still a difficult task. It this work, we show that a few protein features extracted from the network help in detecting reliable negatives. We tested such features in two real world experiments: predicting unreliable negatives with an SVM classifier through temporal holdout on model organisms for AFP, and selecting reliable negatives with a clustering-based state-of-the-art negative selection procedure.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Jean-Philippe Vert,et al.  A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[3]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[4]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[5]  Dennis Shasha,et al.  Parametric Bayesian priors and better choice of negative examples improve protein function prediction , 2013, Bioinform..

[6]  Marco Frasca,et al.  Automated gene function prediction through gene multifunctionality in biological networks , 2015, Neurocomputing.

[7]  Dario Malchiodi,et al.  Selection of Negative Examples for Node Label Prediction Through Fuzzy Clustering Techniques , 2015, Advances in Neural Networks.

[8]  Q. Morris,et al.  Labeling Nodes Using Three Degrees of Propagation , 2012, PloS one.

[9]  Dennis Shasha,et al.  Negative Example Selection for Protein Function Prediction: The NoGO Database , 2014, PLoS Comput. Biol..

[10]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[11]  Simone Bassis,et al.  Gene-Disease Prioritization Through Cost-Sensitive Graph-Based Methodologies , 2016, IWBBIO.

[12]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[13]  Quaid Morris,et al.  Using the Gene Ontology Hierarchy when Predicting Gene Function , 2009, UAI.

[14]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Klamer Schutte,et al.  Selection of negative samples and two-stage combination of multiple features for action detection in thousands of videos , 2013, Machine Vision and Applications.

[17]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[18]  Giorgio Valentini,et al.  UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions , 2015, J. Comput. Biol..

[19]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[20]  William Stafford Noble,et al.  Learning to predict protein-protein interactions from protein sequences , 2003, Bioinform..

[21]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.