Graph-Based Semi-Supervised Learning

While labeled data is expensive to prepare, ever increasing amounts of unlabeled data is becoming widely available. In order to adapt to this phenomenon, several semi-supervised learning (SSL) algorithms, which learn from labeled as well as unlabeled data, have been developed. In a separate line of work, researchers have started to realize that graphs provide a natural way to represent data in a variety of domains. Graph-based SSL algorithms, which bring together these two lines of work, have been shown to outperform the state-of-the-art in many applications in speech processing, computer vision, natural language processing, and other areas of Artificial Intelligence. Recognizing this promising and emerging area of research, this synthesis lecture focuses on graph-based SSL algorithms (e.g., label propagation methods). Our hope is that after reading this book, the reader will walk away with the following: (1) an in-depth knowledge of the current state-of-the-art in graph-based SSL algorithms, and the ability to implement them; (2) the ability to decide on the suitability of graph-based SSL methods for a problem; and (3) familiarity with different applications where graph-based SSL methods have been successfully applied. Table of Contents: Introduction / Graph Construction / Learning and Inference / Scalability / Applications / Future Work / Bibliography / Authors' Biographies / Index

[1]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[3]  Katrin Kirchhoff,et al.  Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP , 2007, NAACL.

[4]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[5]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[6]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[7]  Hal Daumé,et al.  Fast Large-Scale Approximate Graph Construction for NLP , 2012, EMNLP.

[8]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[9]  Partha Pratim Talukdar,et al.  Automatically incorporating new sources in keyword search-based data integration , 2010, SIGMOD Conference.

[10]  Jeff A. Bilmes,et al.  Semi-Supervised Learning with Measure Propagation , 2011, J. Mach. Learn. Res..

[11]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[12]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[13]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Tong Zhang,et al.  Graph-Based Semi-Supervised Learning and Spectral Kernel Design , 2008, IEEE Transactions on Information Theory.

[15]  Tom M. Mitchell,et al.  Acquiring temporal constraints between relations , 2012, CIKM.

[16]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[17]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[18]  Tom M. Mitchell,et al.  Using unlabeled data to improve text classification , 2001 .

[19]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[20]  D. Hosmer A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions Under Three Different Types of Sample , 1973 .

[21]  Daniel A. Spielman,et al.  Fitting a graph to vector data , 2009, ICML '09.

[22]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[23]  M. Griebel,et al.  Semi-supervised learning with sparse grids , 2005, ICML 2005.

[24]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[25]  Jason Weston,et al.  Large scale manifold transduction , 2008, ICML '08.

[26]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[27]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[28]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .

[29]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[30]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[31]  Tom M. Mitchell,et al.  PIDGIN: ontology alignment using web text as interlingua , 2013, CIKM.

[32]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[33]  Ulrike von Luxburg,et al.  Influence of graph construction on graph-based clustering measures , 2008, NIPS.

[34]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[35]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[36]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[37]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[38]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[39]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[40]  Partha Pratim Talukdar,et al.  Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition , 2010, ACL.

[41]  Stephen J. Wright,et al.  Dissimilarity in Graph-Based Semi-Supervised Classification , 2007, AISTATS.

[42]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[43]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[44]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[45]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[46]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[47]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[48]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[49]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[50]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[51]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[52]  Paramveer S. Dhillon,et al.  Inference Driven Metric Learning for Graph Construction , 2010 .

[53]  Jeff A. Bilmes,et al.  On the semi-supervised learning of multi-layered perceptrons , 2009, INTERSPEECH.

[54]  Bert Huang,et al.  Loopy Belief Propagation for Bipartite Maximum Weight b-Matching , 2007, AISTATS.

[55]  Noah A. Smith,et al.  Semi-Supervised Frame-Semantic Parsing for Unknown Predicates , 2011, ACL.

[56]  Ivor W. Tsang,et al.  Large-Scale Sparsified Manifold Regularization , 2006, NIPS.

[57]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[58]  Thorsten Brants,et al.  A Context Pattern Induction Method for Named Entity Extraction , 2006, CoNLL.

[59]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[60]  Pascal O. Vontobel,et al.  A generalized Blahut-Arimoto algorithm , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[61]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[62]  Katrin Kirchhoff,et al.  Phonetic Classification Using Controlled Random Walks , 2011, INTERSPEECH.

[63]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[64]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[65]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[66]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[67]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[68]  Michael Griebel,et al.  Data mining with sparse grids using simplicial basis functions , 2001, KDD '01.

[69]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[70]  Jeff A. Bilmes,et al.  Soft-Supervised Learning for Text Classification , 2008, EMNLP.

[71]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[72]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[73]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[74]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[75]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[76]  Partha Pratim Talukdar,et al.  Topics in Graph Construction for Semi-Supervised Learning , 2009 .

[77]  W. Cheney,et al.  Proximity maps for convex sets , 1959 .

[78]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[79]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[80]  Noah A. Smith,et al.  Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties , 2012, NAACL.

[81]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[82]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[83]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[84]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[85]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[86]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[87]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[88]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[89]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[90]  Claire Cardie,et al.  Adapting a Polarity Lexicon using Integer Linear Programming for Domain-Specific Sentiment Classification , 2009, EMNLP.

[91]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[92]  Andrew Errity,et al.  An investigation of manifold learning for speech analysis , 2006, INTERSPEECH.

[93]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[94]  Noah A. Smith,et al.  Probabilistic Frame-Semantic Parsing , 2010, NAACL.

[95]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[96]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[97]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[98]  Sasha Blair-Goldensohn,et al.  Sentiment Summarization: Evaluating and Learning User Preferences , 2009, EACL.

[99]  Benjamin Van Durme,et al.  Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction , 2008, AAAI.

[100]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[101]  G. McLachlan,et al.  Updating a discriminant function in basis of unclassified data , 1982 .

[102]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[103]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[104]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[105]  Partha Pratim Talukdar,et al.  Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch , 2013, AISTATS.

[106]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[107]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[108]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[109]  Changshui Zhang,et al.  Knowledge Transfer on Hybrid Graph , 2009, IJCAI.

[110]  Katrin Kirchhoff,et al.  Graph-based Learning for Statistical Machine Translation , 2009, NAACL.

[111]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[112]  Koji Tsuda,et al.  Propagating distributions on a hypergraph by dual information regularization , 2005, ICML.

[113]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[114]  J. Bilmes,et al.  Scaling Up Machine Learning: Parallel Graph-Based Semi-Supervised Learning , 2011 .

[115]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[116]  Nathan Srebro,et al.  Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data , 2009, NIPS.

[117]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[118]  Lars Backstrom,et al.  Balanced label propagation for partitioning massive graphs , 2013, WSDM.

[119]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[120]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[121]  Katrin Kirchhoff,et al.  Graph-based learning for phonetic classification , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[122]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[123]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[124]  Daniel Jurafsky,et al.  Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[125]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[126]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[127]  Koby Crammer,et al.  Graph-Based Transduction with Confidence , 2012, ECML/PKDD.

[128]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[129]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[130]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[131]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[132]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[133]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[134]  Ben Taskar,et al.  Graph-Based Posterior Regularization for Semi-Supervised Structured Prediction , 2013, CoNLL.

[135]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[136]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[137]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[138]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[139]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[140]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[141]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[142]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[143]  Koby Crammer,et al.  Inference Driven Metric Learning (IDML) for Graph Construction , 2010 .

[144]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.