Constraint-Driven Training of Complex Models Using MCMC

Standard machine learning approaches require labeled data, and labeling data for each task, language, and domain of interest is not feasible. Consequently, there has been much interest in developing training algorithms that can leverage constraints from prior knowledge to augment or replace labeled data. Most previous work in this area assumes that there exist efficient inference algorithms for the model being trained. For many NLP tasks of interest, such as entity resolution, complex models that require approximate inference are advantageous. In this paper we study algorithms for training complex models using constraints from prior knowledge. We propose an MCMC-based approximation to Generalized Expectation (GE) training, and compare it to Constraint-Driven SampleRank (CDSR). Sequence labeling experiments demonstrate that MCMC GE closely approximates exact GE, and that GE can substantially outperform CDSR. We then apply these methods to train densely-connected citation resolution models. Both methods yield highly accurate models (up to 94% mean pairwise F1) with only two simple constraints.

[1]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[2]  Andrew McCallum,et al.  SampleRank: Training Factor Graphs with Atomic Gradients , 2011, ICML.

[3]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[4]  Andrew McCallum,et al.  Constraint-Driven Rank-Based Learning for Information Extraction , 2010, HLT-NAACL.

[5]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[6]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[7]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[8]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[9]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[10]  Andrew McCallum,et al.  A unified approach for schema matching, coreference and canonicalization , 2008, KDD.

[11]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields , 2008, ACL.

[12]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[13]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[14]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[15]  Andrew McCallum,et al.  An Integrated, Conditional Model of Information Extraction and Coreference with Appli , 2004, UAI.

[16]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[17]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[20]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .