Learning and inference in collective knowledge bases

Truly intelligent action requires large quantities of knowledge. Acquiring this knowledge has long been the major bottleneck preventing the rapid spread of AI systems. Hand-building comprehensive knowledge bases is slow and costly. Machine learning can be much faster and cheaper, but is limited in the depth and breadth of knowledge it can acquire. The spread of the Internet has made possible a new solution: building large knowledge bases by mass collaboration, combining information from a multitude of sources. While such collective knowledge bases (CKBs) promise a breakthrough in coverage and cost-effectiveness, they can only succeed if the quality, relevance, and consistency of the knowledge is kept at acceptable levels. This dissertation introduces an architecture for collective knowledge bases that addresses these problems. It operates in two loops of interaction, one with users and one with contributors. Knowledge from contributors is used to answer questions from users, and feedback from users is used to evaluate the knowledge from contributors. By informing contributors what knowledge was used, and what may be lacking, the CKB remains relevant to the needs of its users. Because they are developed collectively, CKBs must deal with knowledge that is inconsistent and noisy. To be of practical use, they must also be able to handle complex domains involving objects, relations, etc. Thus, this dissertation introduces Markov logic networks (MLNs), a knowledge representation that combines probability with the full power of first-order logic. MLNs are robust to inconsistent knowledge because they view logical statements as soft constraints as opposed to hard ones: when a world violates a formula in the KB it is less probable, but not impossible. This dissertation introduces MLNs and develops algorithms for inference and learning in them. The dissertation also provides an approach to combining weaker knowledge from multiple users (specifically, knowledge about which variables in a domain depend on which others), and an approach to determining the quality of contributors when insufficient data from them is available (using a web of trust among them). The utility of CKBs and the associated methods has been demonstrated with experiments in real world domains, including a public bibliography server (www.bibserv.org), a CKB about a university domain, and a CKB for printer troubleshooting.

[1]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[2]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[3]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[4]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[6]  Richard Bellman,et al.  On the Analytic Formalism of the Theory of Fuzzy Sets , 1973, Inf. Sci..

[7]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[8]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[9]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  I. Anderson,et al.  Graphs and Networks , 1981, The Mathematical Gazette.

[12]  François Bancilhon,et al.  Naive Evaluation of Recursively Defined Relations , 1986, On Knowledge Base Management Systems.

[13]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[14]  Michael R. Genesereth,et al.  Logical foundations of artificial intelligence , 1987 .

[15]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[16]  H. V. Jagadish,et al.  Multiprocessor Transitive Closure Algorithms , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[17]  A. Sokal,et al.  Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. , 1988, Physical review. D, Particles and fields.

[18]  Stephen Muggleton,et al.  Machine Invention of First Order Predicates by Inverting Resolution , 1988, ML.

[19]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  Joseph Y. Halpern An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[22]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[23]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[24]  H. V. Jagadish,et al.  Direct transitive closure algorithms: design and performance evaluation , 1990, TODS.

[25]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[26]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[28]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[29]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[30]  V. S. Subrahmanian,et al.  Probabilistic Logic Programming , 1992, Inf. Comput..

[31]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[32]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[33]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[34]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[35]  Robert P. Goldman,et al.  From knowledge bases to decision models , 1992, The Knowledge Engineering Review.

[36]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[37]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[38]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[39]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[40]  K. J. Evans Representing and Reasoning with Probabilistic Knowledge , 1993 .

[41]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[42]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[43]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[44]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[45]  Joan Feigenbaum,et al.  Decentralized trust management , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[46]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[47]  S. Muggleton Stochastic Logic Programs , 1996 .

[48]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[49]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[50]  Bart Selman,et al.  Referral Web: combining social networks and collaborative filtering , 1997, CACM.

[51]  Alon Y. Halevy,et al.  P-CLASSIC: A Tractable Probablistic Description Logic , 1997, AAAI/IAAI.

[52]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[53]  Stefan Riezler,et al.  Probabilistic Constraint Logic Programming , 1997, ArXiv.

[54]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[55]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Avi Pfeffer,et al.  Learning Probabilities for Noisy First-Order Rules , 1997, IJCAI.

[57]  Taisuke Sato,et al.  PRISM: A Language for Symbolic-Statistical Modeling , 1997, IJCAI.

[58]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[59]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[60]  Stephen Hailes,et al.  A distributed trust model , 1998, NSPW '97.

[61]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[62]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[63]  Manfred Jaeger,et al.  Reasoning About Infinite Random Structures with Relational Bayesian Networks , 1998, KR.

[64]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[65]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[66]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[67]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[68]  James Cussens Loglinear models for first-order probabilistic reasoning , 1999, UAI.

[69]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[70]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[71]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[72]  Michael P. Wellman,et al.  Graphical Representations of Consensus Belief , 1999, UAI.

[73]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[74]  Avi Pfeffer,et al.  SPOOK: A system for probabilistic object-oriented knowledge representation , 1999, UAI.

[75]  Lise Getoor,et al.  Learning Probabilistic Relational Models with Structural Uncertainty , 2000 .

[76]  Deborah L. McGuinness,et al.  An Environment for Merging and Testing Large Ontologies , 2000, KR.

[77]  Luc De Raedt,et al.  Bayesian Logic Programs , 2001, ILP Work-in-progress reports.

[78]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[79]  Stephen Muggleton Semantics and derivation for Stochastic Logic Programs , 2000 .

[80]  Stephen Muggleton,et al.  Learning Stochastic Logic Programs , 2000, Electron. Trans. Artif. Intell..

[81]  Manfred Jaeger,et al.  On the complexity of inference about probabilistic relational models , 2000, Artif. Intell..

[82]  Andrew W. Moore,et al.  A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets , 2000, ICML.

[83]  K. Kersting,et al.  Interpreting Bayesian Logic Programs , 2000 .

[84]  David G. Stork,et al.  Using Open Data Collection for Intelligent Software , 2000, Computer.

[85]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[86]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[87]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[88]  David M. Pennock,et al.  Extracting collective probabilistic forecasts from web games , 2001, KDD '01.

[89]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[90]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[91]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[92]  Luc De Raedt,et al.  Towards Combining Inductive Logic Programming with Bayesian Networks , 2001, ILP.

[93]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[94]  Manfred Jaeger Constraints as Data: A New Perspective on Inferring Probabilities , 2001, IJCAI.

[95]  Yolanda Gil,et al.  Trusting Information Sources One Citizen at a Time , 2002, SEMWEB.

[96]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[97]  R. Guha,et al.  Open Rating Systems , 2002 .

[98]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[99]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[100]  Push Singh,et al.  The Public Acquisition of Commonsense Knowledge , 2002 .

[101]  R. Niaura,et al.  Differentiating stages of smoking intensity among adolescents: stage-specific psychological and social influences. , 2002, Journal of consulting and clinical psychology.

[102]  William H. Press,et al.  Numerical recipes in C , 2002 .

[103]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[104]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[105]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[106]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[107]  J. Cussens Individuals, relations and structures in probabilistic models , 2003 .

[108]  Aymeric Puech A Comparison of Stochastic Logic Programs and Bayesian Logic Programs , 2003 .

[109]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[110]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[111]  Pedro M. Domingos,et al.  Dynamic Probabilistic Relational Models , 2003, IJCAI.

[112]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[113]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[114]  Chad Cumby Dan Roth,et al.  Feature Extraction Languages for Propositionalized Relational Learning , 2003 .

[115]  David Maxwell Chickering,et al.  Learning Bayesian Networks From Dependency Networks: A Preliminary Study , 2003, AISTATS.

[116]  Lyle H. Ungar,et al.  Structural Logistic Regression for Link Analysis , 2003 .

[117]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[118]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[119]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[120]  M. Pazzani,et al.  The Utility of Knowledge in Inductive Learning , 1992, Machine Learning.

[121]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[122]  James Cussens,et al.  Parameter Estimation in Stochastic Logic Programs , 2001, Machine Learning.

[123]  Luc De Raedt,et al.  Clausal Discovery , 1997, Machine Learning.

[124]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[125]  Peishen Qi,et al.  Ontology Translation on the Semantic Web , 2003, J. Data Semant..

[126]  Pedro M. Domingos,et al.  Collective Object Identification , 2005, IJCAI.