Sequential Linear Neighborhood Propagation for Semi-Supervised protein Function Prediction

Predicting protein function is one of the most challenging problems of the post-genomic era. The development of experimental methods for genome scale analysis of molecular interaction networks has provided new approaches to inferring protein function. In this paper we introduce a new graph-based semi-supervised classification algorithm Sequential Linear Neighborhood Propagation (SLNP), which addresses the problem of the classification of partially labeled protein interaction networks. The proposed SLNP first constructs a sequence of node sets according to their shortest distance to the labeled nodes, and then predicts the function of the unlabel proteins from the set closer to labeled one, using Linear Neighborhood Propagation. Its performance is assessed on the Saccharomyces cerevisiae PPI network data sets, with good results compared with three current state-of-the-art algorithms, especially in settings where only a small fraction of the proteins are labeled.