Monte Carlo MCMC: Efficient Inference by Approximate Sampling

Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure---higher tree-width, larger fan-out, more features, and more data---rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sampling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.

[1]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.

[2]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[3]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[4]  C. Fox,et al.  Markov chain Monte Carlo Using an Approximation , 2005 .

[5]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[6]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[7]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[8]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[9]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[10]  Andrew McCallum,et al.  Scalable probabilistic databases with factor graphs and MCMC , 2010, Proc. VLDB Endow..

[11]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[12]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[13]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[14]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[15]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[16]  David A. Smith,et al.  Relaxed Marginal Inference and its Application to Dependency Parsing , 2010, HLT-NAACL.

[17]  James M. Coughlan,et al.  Dynamic quantization for belief propagation in sparse spaces , 2007, Comput. Vis. Image Underst..

[18]  Pedro M. Domingos,et al.  A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC , 2008, AAAI.

[19]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[20]  Andrew McCallum,et al.  Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs , 2009, ECML/PKDD.

[21]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[23]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[24]  Arthur Gretton,et al.  Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees , 2011, AISTATS.

[25]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[26]  Andrew McCallum,et al.  An Entity Based Model for Coreference Resolution , 2009, SDM.

[27]  Andrew McCallum,et al.  Improved Dynamic Schedules for Belief Propagation , 2007, UAI.

[28]  Andrew McCallum,et al.  Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models , 2011, ACL.

[29]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[30]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[31]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[32]  Xavier Carreras,et al.  Experiments with a Higher-Order Projective Dependency Parser , 2007, EMNLP.

[33]  Andrew McCallum,et al.  SampleRank: Training Factor Graphs with Atomic Gradients , 2011, ICML.

[34]  Andrew McCallum,et al.  Distantly Labeling Data for Large Scale Cross-Document Coreference , 2010, ArXiv.