Fast and Accurate Anchor Graph-based Label Prediction

Anchor graphs are a popular tool used in label prediction of sparsely labeled data. In anchor graphs, labels of labeled data are propagated to unlabeled data via anchor points; anchor points are the centers of k-means clusters. Anchor graph-based label prediction determines local weights between data points and anchor points by exploiting Nesterov's method to obtain the graph's adjacency matrix, and it inverts a matrix obtained from the adjacency matrix to predict labels., however, incurs high computation cost since (1) Nesterov's method is applied to all closest anchor points to compute local weights, and (2) the computation cost of the inversion matrix is cubic in the number of anchor points. We propose an approach that can efficiently perform anchor graph-based label prediction because of its two key advances: (1) it prunes unnecessary anchor points so they are not passed to Nesterov's method, and (2) it applies the conjugate gradient method in computing labels of data points to avoid matrix inversion. In addition, we propose to exploit basis vectors computed by SVD as anchor points to improve label prediction accuracy. Experiments show that our approach outperforms the previous approaches in terms of efficiency and accuracy.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Naonori Ueda,et al.  Fast Similarity Computation for t-SNE , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[4]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[5]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  Xindong Wu,et al.  Learning on Big Graph: Label Inference and Regularization with Anchor Hierarchy , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Naonori Ueda,et al.  Efficient Algorithm for the b-Matching Graph , 2020, KDD.

[8]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[9]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[10]  Naonori Ueda,et al.  Fast Algorithm for Anchor Graph Hashing , 2021, Proc. VLDB Endow..

[11]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[12]  Arik Azran,et al.  The rendezvous algorithm: multiclass semi-supervised learning with Markov random walks , 2007, ICML '07.

[13]  James T. Kwok,et al.  Scaling Up Graph-Based Semisupervised Learning via Prototype Vector Machines , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[15]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[16]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[17]  Dennis Shasha,et al.  High Performance Discovery in Time Series , 2004, Monographs in Computer Science.

[18]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[19]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[20]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[21]  Ling Huang,et al.  Semi-Supervised Learning with Max-Margin Graph Cuts , 2010, AISTATS.

[22]  Christian Böhm,et al.  Learning from Labeled and Unlabeled Vertices in Networks , 2017, KDD.

[23]  Meng Wang,et al.  Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[24]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[25]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[26]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[27]  Meng Wang,et al.  FLAG: Faster Learning on Anchor Graph with Label Predictor Optimization , 2017, IEEE Transactions on Big Data.

[28]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[29]  Anastasios Doulamis,et al.  Sample selection algorithms for credit risk modelling through data mining techniques , 2019 .

[30]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.