论文信息 - Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation - 字舞流文

Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation

Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges $|E|$ and distinct labels $m$. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space reduction from $O(m)$ to $O(\log m)$, under certain conditions. In this paper, we present a novel streaming graph-based SSL approximation that captures the sparsity of the label distribution and ensures the algorithm propagates labels accurately, and further reduces the space complexity per node to $O(1)$. We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. We also study different graph construction mechanisms for natural language applications and propose a robust graph augmentation strategy trained using state-of-the-art unsupervised deep learning architectures that yields further significant quality gains.

Sujith Ravi | Qiming Diao | Sujith Ravi | Qiming Diao

[1] Bernhard Schölkopf,et al. Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[2] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4] Slav Petrov,et al. Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[5] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[6] Suresh Venkatasubramanian,et al. Streaming for large scale NLP: Language Modeling , 2009, NAACL.

[7] Partha Pratim Talukdar,et al. Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch , 2013, AISTATS.

[8] M. Seeger. Learning with labeled and unlabeled dataMatthias , 2001 .

[9] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[11] Marko Grobelnik,et al. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II , 2009 .

[12] Bruno Torrésani,et al. Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients , 2009, Signal Image Video Process..

[13] Nicolas Le Roux,et al. Label Propagation and Quadratic Criterion , 2006, Semi-Supervised Learning.

[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[16] Noah A. Smith,et al. Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties , 2012, NAACL.

[17] Ashwin Lall,et al. Exponential Reservoir Sampling for Streaming Language Models , 2014, ACL.

[18] Ashwin Lall,et al. Streaming Pointwise Mutual Information , 2009, NIPS.

[19] Vikas Sindhwani,et al. On Manifold Regularization , 2005, AISTATS.

[20] Zornitsa Kozareva,et al. Class Label Enhancement via Related Instances , 2011, EMNLP.

[21] Graham Cormode,et al. An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[22] Partha Pratim Talukdar,et al. Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition , 2010, ACL.

[23] Partha Pratim Talukdar,et al. Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[24] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[25] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[26] Jeff A. Bilmes,et al. Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[27] Rongrong Ji,et al. Label Propagation from ImageNet to 3D Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28] John Blitzer,et al. Semi-Supervised Learning for Natural Language Processing , 2008, ACL.

[29] Thorsten Joachims,et al. Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[30] Xiaojin Zhu,et al. Semi-Supervised Learning Literature Survey , 2005 .

[31] Koby Crammer,et al. New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[32] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[34] Lars Backstrom,et al. Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[35] Adrian Corduneanu,et al. On Information Regularization , 2002, UAI.

[36] Ashwin Lall,et al. Efficient Online Locality Sensitive Hashing via Reservoir Counting , 2011, ACL.

[37] Shankar Kumar,et al. Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.