Using Embeddings to Predict Changes in Large Semantic Graphs

Understanding and predicting how large knowledge graphs change over time is as difficult as it is useful. An important subtask to address this artificial intelligence challenge is to characterize and predict three types of nodes: add-only nodes that can solely add up new edges, constant nodes whose edges remain unchanged, and del-only nodes whose edges can only be deleted. In this work, we improve previous prediction approaches by using word embeddings from NLP to identify the nodes of the large semantic graph and build a Logistic Regression model. We tested the proposed model in different versions of DBpedia and obtained the following prediction improvements on F1 measure: up to 10% for add-only nodes, close to 15% for constant nodes, and close to 22% for del-only nodes.

[1]  Stefan Schlobach,et al.  Release Early, Release Often: Predicting Change in Versioned Knowledge Organization Systems on the Web , 2015, ArXiv.

[2]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[3]  William Whitney Disentangled Representations in Neural Models , 2016, ArXiv.

[4]  Damián Barsotti,et al.  Predicting Invariant Nodes in Large Scale Semantic Knowledge Graphs , 2017, SIMBig.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Sophia Ananiadou,et al.  A Walk-based Model on Entity Graphs for Relation Extraction , 2018, ACL.

[7]  Vagelis Hristidis,et al.  Efficient Prediction of Difficult Keyword Queries over Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Johann Eder,et al.  Modelling Changes in Ontologies , 2004, OTM Workshops.

[9]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[10]  Brett Drury,et al.  Causation Generalization Through the Identification of Equivalent Nodes in Causal Sparse Graphs Constructed from Text using Node Similarity Strategies , 2015, SIMBig.

[11]  Eero Hyvönen,et al.  Modeling and Reasoning About Changes in Ontology Time Series , 2007, Ontologies.

[12]  Andrea Maurino,et al.  Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity , 2014, COLD.

[13]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[14]  Pablo Ariel Duboué,et al.  Using Robustness to Learn to Order Semantic Properties in Referring Expression Generation , 2016, IBERAMIA.

[15]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.