论文信息 - Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data

In this paper, we present an overview of generalized expectation criteria (GE), a simple, robust, scalable method for semi-supervised training using weakly-labeled data. GE fits model parameters by favoring models that match certain expectation constraints, such as marginal label distributions, on the unlabeled data. This paper shows how to apply generalized expectation criteria to two classes of parametric models: maximum entropy models and conditional random fields. Experimental results demonstrate accuracy improvements over supervised training and a number of other state-of-the-art semi-supervised learning methods for these models.

Gideon S. Mann | Andrew McCallum | A. McCallum

[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[3] Bernard Mérialdo,et al. Tagging English Text with a Probabilistic Model , 1994, CL.

[4] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[5] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[6] Dale Schuurmans. A New Metric-Based Approach to Model Selection , 1997, AAAI/IAAI.

[7] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[8] Sebastian Thrun,et al. Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[9] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[10] Ellen Riloff,et al. A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction , 1999, Natural Language Engineering.

[11] Tong Zhang,et al. The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[12] Avrim Blum,et al. Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[13] Tommi S. Jaakkola,et al. Partially labeled classification with Markov random walks , 2001, NIPS.

[14] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15] Yee Whye Teh,et al. An Alternate Objective Function for Markovian Fields , 2002, ICML.

[16] Zoubin Ghahramani,et al. Learning from labeled and unlabeled data with label propagation , 2002 .

[17] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[18] Robert E. Schapire,et al. Incorporating Prior Knowledge into Boosting , 2002, ICML.

[19] Zoubin Ghahramani,et al. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[20] Tobias Scheffer,et al. Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[21] Jason Weston,et al. Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[22] Adrian Corduneanu,et al. On Information Regularization , 2002, UAI.

[23] Zoubin Ghahramani,et al. Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[24] Claire Cardie,et al. Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[25] Xiaojin Zhu,et al. Kernel conditional random fields: representation and clique selection , 2004, ICML.

[26] J. Lafferty,et al. Kernel conditional random fields : representation, clique selection, and semi-supervised learning , 2004 .

[27] Tobias Scheffer,et al. Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics , 2004, Machine Learning.

[28] Lei Wang,et al. Incorporating prior knowledge into SVM for image retrieval , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[29] Scott Miller,et al. Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[30] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[31] A. McCallum,et al. A Note on Semi-Supervised Learning using Markov Random Fields , 2004 .

[32] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[33] Steven P. Abney. Understanding the Yarowsky Algorithm , 2004, CL.

[34] Dayne Freitag,et al. Trained Named Entity Recognition using Distributional Clusters , 2004, EMNLP.

[35] Dan Klein,et al. Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[36] Yi Liu,et al. A Framework for Incorporating Class Priors into Discriminative Classification , 2005, PAKDD.

[37] Dong-Hong Ji,et al. Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[38] Andrew McCallum,et al. A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[39] Xiaojin Zhu,et al. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[40] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..