A Review of Relational Machine Learning for Knowledge Graphs

Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be “trained” on large knowledge graphs, and then used to predict new facts about the world (which is equivalent to predicting new edges in the graph). In particular, we discuss two fundamentally different kinds of statistical relational models, both of which can scale to massive data sets. The first is based on latent feature models such as tensor factorization and multiway neural networks. The second is based on mining observable patterns in the graph. We also show how to combine these latent and observable models to get improved modeling power at decreased computational cost. Finally, we discuss how such statistical models of graphs can be combined with text-based information extraction methods for automatically constructing knowledge graphs from the Web. To this end, we also discuss Google's knowledge vault project as an example of such combination.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[3]  H. B. Newcombe,et al.  Computers can be used to extract "follow-up" statistics of families from files of routine records. , 1959 .

[4]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[5]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[6]  Douglas B. Lenat,et al.  On the thresholds of knowledge , 1987, Proceedings of the International Workshop on Artificial Intelligence for Industrial Applications.

[7]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[8]  S. Wasserman,et al.  Building stochastic blockmodels , 1992 .

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[12]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[13]  T. Plate A Common Framework for Distributed Representation Schemes for Compositional Structure , 1997 .

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  S. Phillips,et al.  Processing capacity defined by relational complexity: implications for comparative, developmental, and cognitive psychology. , 1998, The Behavioral and brain sciences.

[16]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[17]  John F. Sowa,et al.  Knowledge Representation and Reasoning , 2000 .

[18]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[19]  James A. Hendler,et al.  The Semantic Web published as an article in Scientific American , 2001 .

[20]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[21]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[23]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[24]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[25]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[26]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[27]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[28]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[29]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[30]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[31]  Tamara G. Kolda,et al.  Higher-order Web link analysis using multilinear algebra , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[32]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[33]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[34]  Andrew McCallum,et al.  Joint deduplication of multiple record types in relational data , 2005, CIKM '05.

[35]  Brett W. Bader,et al.  The TOPHITS Model for Higher-Order Web Link Analysis∗ , 2006 .

[36]  Nicola Fanizzi,et al.  Reasoning by Analogy in Description Logics Through Instance-based Learning , 2006, SWAP.

[37]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[38]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[39]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[40]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[41]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[42]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[43]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[44]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[45]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[46]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[47]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[48]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[49]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[50]  Peter D. Hoff,et al.  Modeling homophily and stochastic equivalence in symmetric relational data , 2007, NIPS.

[51]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[52]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[53]  Robert Tibshirani,et al.  Margin Trees for High-dimensional Classification , 2007, J. Mach. Learn. Res..

[54]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[55]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[56]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[57]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[58]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[59]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[60]  Dan Suciu,et al.  Probabilistic databases , 2011, SIGA.

[61]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[62]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[63]  Alan Ruttenberg,et al.  Life sciences on the Semantic Web: the Neurocommons and beyond , 2009, Briefings Bioinform..

[64]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[65]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[66]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[67]  Jens Lehmann,et al.  DL-Learner: Learning Concepts in Description Logics , 2009, J. Mach. Learn. Res..

[68]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[69]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[70]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[71]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[72]  Achim Rettinger,et al.  Materializing and Querying Learned Knowledge , 2009 .

[73]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[74]  Francesca A. Lisi,et al.  Inductive Logic Programming in Databases: From Datalog to $\mathcal{DL}+log}^{\neg\vee}$ , 2010, Theory and Practice of Logic Programming.

[75]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[76]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[77]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[78]  Aditya Kalyanpur,et al.  PRISMATIC: Inducing Knowledge from a Large Scale Lexicalized Relation Resource , 2010, HLT-NAACL 2010.

[79]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[80]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[81]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[82]  Gerhard Weikum,et al.  From information to knowledge: harvesting entities and relationships from web sources , 2010, PODS '10.

[83]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[84]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[85]  Pauli Miettinen,et al.  Boolean Tensor Factorizations , 2011, 2011 IEEE 11th International Conference on Data Mining.

[86]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[87]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[88]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[89]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[90]  Maximilian Nickel Learning Taxonomies from Multi-Relational Data via Hierarchical Link-Based Clustering , 2011 .

[91]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[92]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[93]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[94]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[95]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[96]  Lars Schmidt-Thieme,et al.  Predicting RDF triples in incomplete knowledge bases with tensor factorization , 2012, SAC '12.

[97]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[98]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[99]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[100]  Lise Getoor,et al.  A short introduction to probabilistic soft logic , 2012, NIPS 2012.

[101]  Yizhou Sun,et al.  Mining heterogeneous information networks , 2012 .

[102]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[103]  Dejing Dou,et al.  Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic , 2012, 2012 IEEE 12th International Conference on Data Mining.

[104]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[105]  Achim Rettinger,et al.  Mining the Semantic Web , 2012, Data Mining and Knowledge Discovery.

[106]  Volker Tresp,et al.  Mining the Semantic Web Statistical Learning for Next Generation Knowledge Bases , 2012 .

[107]  Xueyan Jiang,et al.  Link Prediction in Multi-relational Graphs using Additive Models , 2012, SeRSy.

[108]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[109]  Hector Garcia-Molina,et al.  Joint Entity Resolution , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[110]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[111]  Stephan Bloehdorn,et al.  Graph Kernels for RDF Data , 2012, ESWC.

[112]  Heng Ji,et al.  Tackling representation, annotation and classification challenges for temporal knowledge base population , 2014, Knowledge and Information Systems.

[113]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[114]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[115]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[116]  V. S. Costa,et al.  Inductive Logic Programming , 2014, Lecture Notes in Computer Science.

[117]  Volker Tresp,et al.  Tensor Factorization for Multi-relational Learning , 2013, ECML/PKDD.

[118]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[119]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[120]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[121]  Christopher D. Manning,et al.  Philosophers are Mortal: Inferring the Truth of Unseen Facts , 2013, CoNLL.

[122]  Maximilian Nickel,et al.  Tensor factorization for relational learning , 2013 .

[123]  Christopher Ré,et al.  Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[124]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[125]  Volker Tresp,et al.  Logistic Tensor Factorization for Multi-Relational Data , 2013, ArXiv.

[126]  Pauli Miettinen,et al.  Discovering facts with boolean tensor tucker decomposition , 2013, CIKM.

[127]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[128]  Steffen Rendle Scaling Factorization Machines to Relational Data , 2013, Proc. VLDB Endow..

[129]  Fabian M. Suchanek,et al.  Inside YAGO2s: a transparent information extraction architecture , 2013, WWW '13 Companion.

[130]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[131]  Volker Tresp,et al.  Large-scale factorization of type-constrained multi-relational data , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[132]  Volker Tresp,et al.  Querying Factorized Probabilistic Triple Databases , 2014, SEMWEB.

[133]  Nicola Fanizzi,et al.  Learning to Propagate Knowledge in Web Ontologies , 2014, URSW.

[134]  Hans-Peter Kriegel,et al.  A scalable approach for statistical learning in semantic graphs , 2014, Semantic Web.

[135]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[136]  Xueyan Jiang,et al.  Reducing the Rank in Relational Factorization Models by Including Observable Patterns , 2014, NIPS.

[137]  John A. Barnden,et al.  Semantic Networks , 1998, Encyclopedia of Social Network Analysis and Mining.

[138]  Xueyan Jiang,et al.  Probabilistic Latent-Factor Database Models , 2014, LD4KD.

[139]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[140]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[141]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[142]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[143]  Fabian M. Suchanek,et al.  Fast rule mining in ontological knowledge bases with AMIE+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{docu , 2015, The VLDB Journal.

[144]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[145]  Daniel M. Roy,et al.  Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[146]  Danqi Chen,et al.  Observed versus latent features for knowledge base and text inference , 2015, CVSC.

[147]  Lise Getoor,et al.  Using Semantics and Statistics to Turn Data into Knowledge , 2015, AI Mag..

[148]  Hans Uszkoreit,et al.  Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning , 2015, ICAART.

[149]  Stephen H. Bach Hinge-Loss Markov Random Fields and Probabilistic Soft Logic: A Scalable Approach to Structured Prediction , 2015, J. Mach. Learn. Res..

[150]  Nada Lavrač,et al.  Relational Data Mining , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..