A Neural Procedure for Gene Function Prediction

The graph classification problem consists, given a weighted graph and a partial node labeling, in extending the labels to all nodes. In many real-world context, such as Gene Function Prediction, the partial labeling is unbalanced: positive labels are much less than negatives. In this paper we present a new neural algorithm for predicting labels in presence of label imbalance. This algorithm is based on a family of Hopfield networks, described by 2 continuous parameters and 1 discrete parameter, and it consists of two main steps: 1) the network parameters are learnt through a cost-sensitive optimization procedure based on local search; 2) a suitable Hopfield network restricted to unlabeled nodes is considered and simulated. The reached equilibrium point induces the classification of unlabeled nodes. An experimental analysis on real-world unbalanced data in the context of genome-wide prediction of gene functions show the effectiveness of the proposed approach.

[1]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[2]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[3]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[4]  L K McLoon Transplanted neurons: neural transplantation and regeneration. , 1986, Science.

[5]  Bernhard Schölkopf,et al.  Protein Structure and Function Fast protein classification with multiple networks , 2005 .

[6]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[7]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[8]  Daniel J. Brass,et al.  Network Analysis in the Social Sciences , 2009, Science.

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[11]  Giorgio Valentini,et al.  COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs , 2011, ECML/PKDD.

[12]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[13]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[15]  Nicolas Le Roux,et al.  Label Propagation and Quadratic Criterion , 2006, Semi-Supervised Learning.

[16]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[17]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[18]  Albert-László Barabási,et al.  The Architecture of Biological Networks , 2006 .

[19]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[20]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.