Lγ-PageRank for semi-supervised learning

PageRank for Semi-Supervised Learning has shown to leverage data structures and limited tagged examples to yield meaningful classification. Despite successes, classification performance can still be improved, particularly in cases of graphs with unclear clusters or unbalanced labeled data. To address such limitations, a novel approach based on powers of the Laplacian matrix Lγ (γ>0), referred to as Lγ-PageRank, is proposed. Its theoretical study shows that it operates on signed graphs, where nodes belonging to one same class are more likely to share positive edges while nodes from different classes are more likely to be connected with negative edges. It is shown that by selecting an optimal γ, classification performance can be significantly enhanced. A procedure for the automated estimation of the optimal γ, from a unique observation of data, is devised and assessed. Experiments on several datasets demonstrate the effectiveness of both Lγ-PageRank classification and the optimal γ estimation.

[1]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[2]  Libor Spacek,et al.  Distinctive Descriptions for Face Processing , 1997, BMVC.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[5]  Derek Greene,et al.  Practical solutions to the problem of diagonal dominance in kernel document clustering , 2006, ICML.

[6]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[7]  Fan Chung Graham,et al.  Using PageRank to Locally Partition a Graph , 2007, Internet Math..

[8]  Fan Chung Graham,et al.  Detecting Sharp Drops in PageRank and a Simplified Local Partitioning Algorithm , 2007, TAMC.

[9]  F. Chung PageRank as a discrete Green ’ s function , 2008 .

[10]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[11]  Konstantin Avrachenkov,et al.  Pagerank based clustering of hypertext document collections , 2008, SIGIR '08.

[12]  S. Rice A stochastic version of the Price equation reveals the interplay of deterministic and stochastic processes in evolution , 2008, BMC Evolutionary Biology.

[13]  Fan Chung Graham,et al.  Distributing Antidote Using PageRank Vectors , 2009, Internet Math..

[14]  Bert Zwart,et al.  Characterization of Tail Dependence for In-Degree and PageRank , 2009, WAW.

[15]  F. Chung Four Cheeger-type Inequalities for Graph Partitioning Algorithms ∗ , 2009 .

[16]  Mikhail Belkin,et al.  Semi-supervised Learning by Higher Order Regularization , 2011, AISTATS.

[17]  F. Graham,et al.  Diffusion and clustering on large graphs , 2012 .

[18]  Konstantin Avrachenkov,et al.  Generalized Optimization Framework for Graph-based Semi-supervised Learning , 2011, SDM.

[19]  Konstantin Avrachenkov,et al.  Classification of content and users in BitTorrent by semi-supervised learning methods , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[20]  Marina Sokol,et al.  Graph-based semi-supervised learning methods and quick detection of central nodes. (Méthodes d'apprentissage semi-supervisé basé sur les graphes et détection rapide des nœuds centraux) , 2014 .

[21]  A. P. Riascos,et al.  Fractional dynamics on networks: emergence of anomalous diffusion and Lévy flights. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Cristopher Moore,et al.  Phase transitions in semisupervised clustering of sparse networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Tommy W. S. Chow,et al.  Compact Graph based Semi-Supervised Learning for Medical Diagnosis in Alzheimer’s Disease , 2014, IEEE Signal Processing Letters.

[24]  Elchanan Mossel,et al.  Reconstruction and estimation in the planted partition model , 2012, Probability Theory and Related Fields.

[25]  S. Rice The expected value of the ratio of correlated random variables , 2015 .

[26]  Konstantin Avrachenkov,et al.  Fractional graph-based semi-supervised learning , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[27]  Konstantin Avrachenkov,et al.  Lévy Flights for Graph Based Semi-Supervised Classification. , 2017 .

[28]  Pascal Frossard,et al.  Distributed Signal Processing via Chebyshev Polynomial Approximation , 2011, IEEE Transactions on Signal and Information Processing over Networks.

[29]  Konstantin Avrachenkov,et al.  Mean Field Analysis of Personalized PageRank with Implications for Local Graph Clustering , 2018, ArXiv.

[30]  Kensuke Fukuda,et al.  BGP Zombies: An Analysis of Beacons Stuck Routes , 2019, PAM.