Factorizing YAGO: scalable machine learning for linked data

Vast amounts of structured information have been published in the Semantic Web's Linked Open Data (LOD) cloud and their size is still growing rapidly. Yet, access to this information via reasoning and querying is sometimes difficult, due to LOD's size, partial data inconsistencies and inherent noisiness. Machine Learning offers an alternative approach to exploiting LOD's data with the advantages that Machine Learning algorithms are typically robust to both noise and data inconsistencies and are able to efficiently utilize non-deterministic dependencies in the data. From a Machine Learning point of view, LOD is challenging due to its relational nature and its scale. Here, we present an efficient approach to relational learning on LOD data, based on the factorization of a sparse tensor that scales to data consisting of millions of entities, hundreds of relations and billions of known facts. Furthermore, we show how ontological knowledge can be incorporated in the factorization to improve learning results and how computation can be distributed across multiple nodes. We demonstrate that our approach is able to factorize the YAGO~2 core ontology and globally predict statements for this large knowledge base using a single dual-core desktop computer. Furthermore, we show experimentally that our approach achieves good results in several relational learning tasks that are relevant to Linked Data. Once a factorization has been computed, our model is able to predict efficiently, and without any additional training, the likelihood of any of the 4.3 ⋅ 1014 possible triples in the YAGO~2 core ontology.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[3]  Thanh Tran,et al.  Relational Kernel Machines for Learning from Graph-Structured RDF Data , 2011, ESWC.

[4]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[5]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[6]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[7]  Thomas Hofmann,et al.  Learning annotated hierarchies from relational data , 2007 .

[8]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[9]  R. Bro PARAFAC. Tutorial and applications , 1997 .

[10]  Tamara G. Kolda,et al.  Temporal Analysis of Semantic Graphs Using ASALSAN , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Jens Lehmann,et al.  Learning of OWL Class Descriptions on Very Large Knowledge Bases , 2008, SEMWEB.

[12]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[13]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[14]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[15]  Frank van Harmelen,et al.  A reasonable Semantic Web , 2010, Semantic Web.

[16]  Nicola Fanizzi,et al.  Non-parametric Statistical Learning Methods for Inductive Classifiers in Semantic Knowledge Bases , 2008, 2008 IEEE International Conference on Semantic Computing.

[17]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[18]  LehmannJens,et al.  Creating knowledge out of interlinked data , 2010 .

[19]  Stephan Bloehdorn,et al.  Kernel Methods for Mining Instance Data in Ontologies , 2007, ISWC/ASWC.

[20]  Nicola Fanizzi,et al.  DL-FOIL Concept Learning in Description Logics , 2008, ILP.

[21]  Abraham Bernstein,et al.  Adding Data Mining Support to SPARQL Via Statistical Relational Learning Methods , 2008, ESWC.

[22]  Jens Lehmann,et al.  Creating knowledge out of interlinked data , 2010, Semantic Web.

[23]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[24]  Hans-Peter Kriegel,et al.  Multivariate Prediction for Learning on the Semantic Web , 2010, ILP.

[25]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[26]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[27]  Jeff Z. Pan,et al.  The Semantic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29-June 2, 2011, Proceedings, Part I , 2010, ESWC.

[28]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[29]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[30]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[31]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[32]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[33]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[34]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[35]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[36]  Frank van Harmelen,et al.  Preface - Special issue on commonsense reasoning for the semantic web , 2010, Ann. Math. Artif. Intell..

[37]  Vasant Honavar,et al.  Learning Relational Bayesian Classifiers from RDF Data , 2011, SEMWEB.