Learning on Big Graph: Label Inference and Regularization with Anchor Hierarchy

Several models have been proposed to cope with the rapidly increasing size of data, such as Anchor Graph Regularization (AGR). The AGR approach significantly accelerates graph-based learning by exploring a set of anchors. However, when a dataset becomes much larger, AGR still faces a big graph which brings dramatically increasing computational costs. To overcome this issue, we propose a novel Hierarchical Anchor Graph Regularization (HAGR) approach by exploring multiple-layer anchors with a pyramid-style structure. In HAGR, the labels of datapoints are inferred from the coarsest anchors layer by layer in a coarse-to-fine manner. The label smoothness regularization is performed on all datapoints, and we demonstrate that the optimization process only involves a small-size reduced Laplacian matrix. We also introduce a fast approach to construct our hierarchical anchor graph based on an approximate nearest neighbor search technique. Experiments on million-scale datasets demonstrate the effectiveness and efficiency of the proposed HAGR approach over existing methods. Results show that the HAGR approach is even able to achieve a good performance within 3 minutes in an 8-million-example classification task.

[1]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[2]  Mario Vento,et al.  Graph Matching and Learning in Pattern Recognition in the Last 10 Years , 2014, Int. J. Pattern Recognit. Artif. Intell..

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Jason Weston,et al.  Large scale manifold transduction , 2008, ICML '08.

[5]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[6]  Nicu Sebe,et al.  Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  David J. Fleet,et al.  Fast Exact Search in Hamming Space With Multi-Index Hashing , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[10]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11]  Yang Yang,et al.  Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.

[12]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[13]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ahmed M. Elgammal,et al.  Learning Hypergraph-regularized Attribute Predictors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Xuelong Li,et al.  Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing , 2015, IEEE Transactions on Big Data.

[18]  Seungjin Choi,et al.  Multi-view anchor graph hashing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[20]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[21]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[22]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[23]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[24]  Zili Zhang,et al.  Semi-supervised classification based on subspace sparse representation , 2013, Knowledge and Information Systems.

[25]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[26]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[28]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[29]  Meng Wang,et al.  Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[30]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[32]  Matthias Hein,et al.  Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts , 2011, NIPS.

[33]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[34]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[35]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[36]  Rongrong Ji,et al.  Visual Reranking through Weakly Supervised Multi-graph Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Yi Yang,et al.  Discriminative Nonnegative Spectral Clustering with Out-of-Sample Extension , 2013, IEEE Transactions on Knowledge and Data Engineering.

[38]  Ivor W. Tsang,et al.  Laplacian Embedded Regression for Scalable Manifold Regularization , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[40]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[41]  David J. Slate,et al.  Letter Recognition Using Holland-Style Adaptive Classifiers , 1991, Machine Learning.

[42]  Xinlei Chen,et al.  Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[43]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[44]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[45]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[46]  James T. Kwok,et al.  Scaling Up Graph-Based Semisupervised Learning via Prototype Vector Machines , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[48]  Rui Kuang,et al.  Global Linear Neighborhoods for Efficient Label Propagation , 2012, SDM.

[49]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[50]  David Machin,et al.  Introduction to Multimodal Analysis , 2007 .

[51]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[52]  Chun Chen,et al.  EMR: A Scalable Graph-Based Ranking Model for Content-Based Image Retrieval , 2015, IEEE Transactions on Knowledge and Data Engineering.

[53]  Inderjit S. Dhillon,et al.  A Divide-and-Conquer Solver for Kernel Support Vector Machines , 2013, ICML.

[54]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.