Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

In this paper, we present an overview of generalized expectation criteria (GE), a simple, robust, scalable method for semi-supervised training using weakly-labeled data. GE fits model parameters by favoring models that match certain expectation constraints, such as marginal label distributions, on the unlabeled data. This paper shows how to apply generalized expectation criteria to two classes of parametric models: maximum entropy models and conditional random fields. Experimental results demonstrate accuracy improvements over supervised training and a number of other state-of-the-art semi-supervised learning methods for these models.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[4]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[5]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[6]  Dale Schuurmans A New Metric-Based Approach to Model Selection , 1997, AAAI/IAAI.

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[9]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[10]  Ellen Riloff,et al.  A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction , 1999, Natural Language Engineering.

[11]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[12]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[13]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[16]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[17]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[18]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[19]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[20]  Tobias Scheffer,et al.  Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[21]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[22]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[23]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[24]  Claire Cardie,et al.  Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[25]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[26]  J. Lafferty,et al.  Kernel conditional random fields : representation, clique selection, and semi-supervised learning , 2004 .

[27]  Tobias Scheffer,et al.  Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics , 2004, Machine Learning.

[28]  Lei Wang,et al.  Incorporating prior knowledge into SVM for image retrieval , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[30]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[31]  A. McCallum,et al.  A Note on Semi-Supervised Learning using Markov Random Fields , 2004 .

[32]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[33]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[34]  Dayne Freitag,et al.  Trained Named Entity Recognition using Distributional Clusters , 2004, EMNLP.

[35]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[36]  Yi Liu,et al.  A Framework for Incorporating Class Priors into Discriminative Classification , 2005, PAKDD.

[37]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[38]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[39]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[40]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[41]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[42]  Wei Li,et al.  Semi-Supervised Sequence Modeling with Syntactic Topic Models , 2005, AAAI.

[43]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[44]  Edward Y. Chang,et al.  Manifold learning, a promised land or work in progress? , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[45]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[46]  Alexander Zien,et al.  Large-Scale Algorithms , 2006 .

[47]  Bernhard Schölkopf,et al.  Analysis of Benchmarks , 2006, Semi-Supervised Learning.

[48]  John C. Platt,et al.  Semi-Supervised Learning with Conditional Harmonic Mixing , 2006, Semi-Supervised Learning.

[49]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[50]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[51]  Dan Klein,et al.  Prototype-Driven Grammar Induction , 2006, ACL.

[52]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[53]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[54]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[55]  Gerhard Weikum,et al.  Transductive Learning for Text Classification Using Explicit Knowledge Models , 2006, PKDD.

[56]  Jun Suzuki,et al.  Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach , 2007, EMNLP.

[57]  Koby Crammer,et al.  Penn/Umass/CHOP Biocreative II systems , 2007 .

[58]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[59]  Ben Taskar,et al.  Expectation Maximization and Posterior Constraints , 2007, NIPS.

[60]  Christopher Joseph Pal,et al.  Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[61]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[62]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[63]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[64]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[65]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[66]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[67]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[68]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[69]  Gideon S. Mann,et al.  Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria , 2009, ACL/IJCNLP.

[70]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[71]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[72]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.