Regularizing Structured Classifier with Conditional Probabilistic Constraints for Semi-supervised Learning

Constraints have been shown as an effective way to incorporate unlabeled data for semi-supervised structured classification. We recognize that, constraints are often conditional and probabilistic; moreover, a constraint can have its condition depend on either just observations (which we call x-type constraint) or even hidden variables (which we call y-type constraint). We wish to design a constraint formulation that can flexibly model the constraint probability for both x-type and y-type constraints, and later use it to regularize general structured classifiers for semi-supervision. Surprisingly, none of the existing models have such a constraint formulation. Thus in this paper, we propose a new conditional probabilistic formulation for modeling both x-type and y-type constraints. We also recognize the inference complication for y-type constraint, and propose a systematic selective evaluation approach to efficiently realize the constraints. Finally, we evaluate our model in three applications, including named entity recognition, part-of-speech tagging and entity information extraction, with totally nine data sets. We show that our model is generally more accurate and efficient than the state-of-the-art baselines. Our code and data are available at https://bitbucket.org/vwz/cikm2016-cpf/.

[1]  Sameer Singh,et al.  Injecting Logical Background Knowledge into Embeddings for Relation Extraction , 2015, NAACL.

[2]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[3]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[4]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[5]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[6]  Min Zhang,et al.  Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study , 2015, ACL.

[7]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[8]  Eric P. Xing,et al.  Grafting-light: fast, incremental feature selection and structure learning of Markov random fields , 2010, KDD '10.

[9]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[10]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[11]  Jan Kautz,et al.  Fully-Connected CRFs with Non-Parametric Pairwise Potential , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  André F. T. Martins Transferring Coreference Resolvers with Posterior Regularization , 2015, ACL.

[13]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[14]  Jun Zhu,et al.  Robust RegBayes: Selectively Incorporating First-Order Logic Domain Knowledge into Bayesian Models , 2014, ICML.

[15]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[16]  Clare R. Voss,et al.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering , 2015, KDD.

[17]  Partha Pratim Talukdar,et al.  SCAD: collective discovery of attribute values , 2011, WWW.

[18]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[19]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[20]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[21]  Rahul Gupta,et al.  Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited , 2011, WSDM '11.

[22]  Claire Cardie,et al.  Context-aware Learning for Sentence-level Sentiment Analysis with Posterior Regularization , 2014, ACL.

[23]  Andrew McCallum,et al.  Learning Soft Linear Constraints with Application to Citation Field Extraction , 2014, ACL.

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Ayah Zirikly,et al.  Cross-lingual Transfer of Named Entity Recognizers without Parallel Corpora , 2015, ACL.

[26]  Jimeng Sun,et al.  Incorporating Social Context and Domain Knowledge for Entity Recognition , 2015, WWW.

[27]  Ben Taskar,et al.  Graph-Based Posterior Regularization for Semi-Supervised Structured Prediction , 2013, CoNLL.

[28]  GetoorLise,et al.  Hinge-loss Markov random fields and probabilistic soft logic , 2017 .

[29]  Jeff A. Bilmes,et al.  Entropic Graph-based Posterior Regularization , 2015, ICML.

[30]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[31]  Liang Tian,et al.  Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints , 2014, ACL.

[32]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[33]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.