Probabilistic models with unknown objects

Humans and other intelligent agents must make inferences about the real-world objects that underlie their observations: for instance, the objects visible in an image, or the people mentioned in a set of text documents. The agent may not know in advance how many objects exist, how they are related to each other, or which observations correspond to which underlying objects. Existing declarative representations for probabilistic models do not capture the structure of such scenarios. This thesis introduces Bayesian logic (BLOG), a first-order probabilistic modeling language that specifies probability distributions over possible worlds with varying sets of objects. A BLOG model contains statements that define conditional probability distributions for a certain set of random variables; the model also specifies certain context-specific independence properties. We provide criteria under which such a model is guaranteed to fully define a probability distribution. These criteria go beyond existing results in that they can be satisfied even when the Bayesian network defined by the model is cyclic, or contains nodes with infinitely many ancestors. We describe several approximate inference algorithms that exploit the context-specific dependence structure revealed by a BLOG model. First, we present rejection sampling and likelihood weighting algorithms that are guaranteed to converge to the correct probability for any query on a structurally well-defined BLOG model. Because these algorithms instantiate only those variables that are context-specifically relevant, they can generate samples in finite time even when the model defines infinitely many variables. We then define a general framework for inference on BLOG models using Markov chain Monte Carlo (MCMC) algorithms. This framework allows a programmer to plug in a domain-specific proposal distribution, which helps the Markov chain move to high-probability worlds. Furthermore, the chain can operate on partial world descriptions that specify values only for context-specifically relevant variables. We give conditions under which MCMC over such partial world descriptions is guaranteed to converge to correct probabilities. We also show that this framework performs efficiently on a real-world task: reconstructing the set of distinct publications referred to by a set of bibliographic citations.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Jean-Pierre Bourguignon,et al.  Mathematische Annalen , 1893 .

[3]  D. König Sur les correspondances multivoques des ensembles , 2022 .

[4]  Alonzo Church,et al.  A note on the Entscheidungsproblem , 1936, Journal of Symbolic Logic.

[5]  L. Kalmár Zurückführung des Entscheidungsproblems auf den Fall von Formeln mit einer einzigen, binären, Funktionsvariablen , 1937 .

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[8]  J. Davenport Editor , 1960 .

[9]  H. Gaifman Concerning measures in first order calculi , 1964 .

[10]  Robert W. Sittler,et al.  An Optimal Data Association Problem in Surveillance Theory , 1964, IEEE Transactions on Military Electronics.

[11]  J. Heijenoort From Frege to Gödel: A Source Book in Mathematical Logic, 1879-1931 , 1967 .

[12]  C. Nash-Williams,et al.  Infinite graphs—A survey , 1967 .

[13]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[14]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[15]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[16]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[17]  Rudolf Mathon,et al.  A Note on the Graph Isomorphism counting Problem , 1979, Inf. Process. Lett..

[18]  Raymond Reiter,et al.  Equality and Domain Closure in First-Order Databases , 1980, JACM.

[19]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[20]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[22]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  T. Speed,et al.  Recursive causal models , 1984, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[24]  Editors , 1986, Brain Research Bulletin.

[25]  Max Henrion,et al.  Propagating uncertainty in bayesian networks by probabilistic logic sampling , 1986, UAI.

[26]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[27]  Roland Fraïssé Theory of relations , 1986 .

[28]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[29]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[30]  R. Lathe Phd by thesis , 1988, Nature.

[31]  Y. Bar-Shalom Tracking and data association , 1988 .

[32]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[33]  Hans-Otto Georgii,et al.  Gibbs Measures and Phase Transitions , 1988 .

[34]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[35]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[36]  Joseph Y. Halpern An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[37]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[38]  Kuo-Chu Chang,et al.  Weighing and Integrating Evidence for Stochastic Simulation in Bayesian Networks , 2013, UAI.

[39]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[40]  Ross D. Shachter,et al.  Symbolic Probabilistic Inference in Belief Networks , 1990, AAAI.

[41]  David Poole,et al.  A Dynamic Approach to Probabilistic Inference using Bayesian Networks , 1990, UAI 1990.

[42]  R. Durrett Probability: Theory and Examples , 1993 .

[43]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[44]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[45]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[46]  Robert P. Goldman,et al.  A Bayesian Model of Plan Recognition , 1993, Artif. Intell..

[47]  David Poole,et al.  Probabilistic Horn Abduction and Bayesian Networks , 1993, Artif. Intell..

[48]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[49]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[50]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[51]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[52]  Walter R. Gilks,et al.  Introduction to general state-space Markov chain theory , 1995 .

[53]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[54]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[55]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[56]  H. Selbmann,et al.  Learning to recognize objects , 1999, Trends in Cognitive Sciences.

[57]  S. Muggleton Stochastic Logic Programs , 1996 .

[58]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[59]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[60]  David Poole,et al.  The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty , 1997, Artif. Intell..

[61]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[62]  Manfred Jaeger,et al.  Relational Bayesian Networks , 1997, UAI.

[63]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[64]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[65]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[66]  K. Ciesielski Set Theory for the Working Mathematician: Natural numbers, integers, and real numbers , 1997 .

[67]  M. Franchella On the origins of Dénes König's infinity lemma , 1997 .

[68]  Taisuke Sato,et al.  PRISM: A Language for Symbolic-Statistical Modeling , 1997, IJCAI.

[69]  C. Lee Giles,et al.  CiteSeer: an automatic citation indexing system , 1998, DL '98.

[70]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[71]  Thomas Lukasiewicz,et al.  Probabilistic Logic Programming , 1998, ECAI.

[72]  Manfred Jaeger,et al.  Reasoning About Infinite Random Structures with Relational Bayesian Networks , 1998, KR.

[73]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[74]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[75]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[76]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[77]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[78]  Thomas Lukasiewicz,et al.  Probalilistic Logic Programming under Maximum Entropy , 1999, ESCQARU.

[79]  Daphne Koller,et al.  Probabilistic reasoning for complex systems , 1999 .

[80]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[81]  Avi Pfeffer,et al.  SPOOK: A system for probabilistic object-oriented knowledge representation , 1999, UAI.

[82]  Luc De Raedt,et al.  Bayesian Logic Programs , 2001, ILP Work-in-progress reports.

[83]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[84]  Avi Pfeffer,et al.  Semantics and Inference for Recursive Probability Models , 2000, AAAI/IAAI.

[85]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[86]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[87]  Luc De Raedt,et al.  Adaptive Bayesian Logic Programs , 2001, ILP.

[88]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[89]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[90]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[91]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[92]  Yoshitaka Kameya,et al.  Parameter Learning of Logic Programs for Symbolic-Statistical Modeling , 2001, J. Artif. Intell. Res..

[93]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[94]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[95]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[96]  S. T. Buckland,et al.  Estimating Animal Abundance , 2002 .

[97]  Aymeric Puech A Comparison of Stochastic Logic Programs and Bayesian Logic Programs , 2003 .

[98]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[99]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[100]  Nevin Lianwen Zhang,et al.  Exploiting Contextual Independence In Probabilistic Inference , 2011, J. Artif. Intell. Res..

[101]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[102]  Dan Roth,et al.  Robust Reading: Identification and Tracing of Ambiguous Names , 2004, NAACL.

[103]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[104]  Manfred Jaeger,et al.  Complex Probabilistic Modeling with Recursive Relational Bayesian Networks , 2001, Annals of Mathematics and Artificial Intelligence.

[105]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[106]  Andrew McCallum,et al.  An Integrated, Conditional Model of Information Extraction and Coreference with Appli , 2004, UAI.

[107]  Maurice Bruynooghe,et al.  Logic programs with annotated disjunctions , 2004, NMR.

[108]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[109]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[110]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[111]  Songhwai Oh,et al.  Markov chain Monte Carlo data association for general multiple-target tracking problems , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[112]  E. Mjolsness Labeled graph notations for graphical models , 2004 .

[113]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[114]  Pedro M. Domingos,et al.  Object Identification with Attribute-Mediated Dependences , 2005, PKDD.

[115]  Henry A. Kautz,et al.  Performing Bayesian Inference by Weighted Model Counting , 2005, AAAI.

[116]  Stuart J. Russell,et al.  Approximate Inference for Infinite Contingent Bayesian Networks , 2005, AISTATS.

[117]  Andrew McCallum,et al.  Joint deduplication of multiple record types in relational data , 2005, CIKM '05.

[118]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[119]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[120]  Nir Friedman,et al.  Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[121]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[122]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[123]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[124]  Stuart J. Russell,et al.  General-Purpose MCMC Inference over Relational Structures , 2006, UAI.

[125]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[126]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[127]  Manfred Jaeger,et al.  Compiling relational Bayesian networks for exact inference , 2006, Int. J. Approx. Reason..

[128]  Eric Mjolsness,et al.  Stochastic Process Semantics for Dynamical Grammar Syntax: An Overview , 2005, AI&M.

[129]  Kathryn B. Laskey MEBN: A Logic for Open-World Probabilistic Reasoning , 2006 .

[130]  Ben Taskar,et al.  Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .