论文信息 - Learning Rules from Incomplete KGs using Embeddings

Learning Rules from Incomplete KGs using Embeddings

Rules over a Knowledge Graph (KG) capture interpretable patterns in data and various methods for rule learning have been proposed. Since KGs are inherently incomplete, rules can be used to deduce missing facts. Statistical measures for learned rules such as confidence reflect rule quality well when the KG is reasonably complete; however, these measures might be misleading otherwise. So it is difficult to learn high-quality rules from the KG alone, and scalability dictates that only a small set of candidate rules is generated. Therefore, the ranking and pruning of candidate rules is a major problem. To address this issue, we propose a rule learning method that utilizes probabilistic representations of missing facts. In particular, we iteratively extend rules induced from a KG by relying on feedback from a precomputed embedding model over the KG and external information sources including text corpora. Experiments on real-world KGs demonstrate the effectiveness of our novel approach both with respect to the quality of the learned rules and fact predictions that they produce. Motivation. Rules are widely used to represent relationships and dependencies among data items in datasets and to capture the underlying patterns in data [1,16]. Applications of rules include health-care [24], telecommunications [9], manufacturing [2,10,11,12,7], and commerce [17,6]. In order to facilitate rule construction, a variety of rule learning methods have been developed, see e.g. [8] for an overview. Moreover, various statistical measures such as confidence, actionability, and unexpectedness to evaluate the quality of the learned rules have been proposed. Rule learning has recently been adapted for the setting of Knowledge Graphs (KGs) [4,22] where data is represented as a graph of entities interconnected via relations and labeled with classes, or more formally as a set of grounded binary and unary atoms typically referred to as facts. Examples of large-scale KGs include Wikidata [20], Yago [19], and Google’s KG. Since many KGs are constructed from semi-structured knowledge, such as Wikipedia, or harvested from the Web with a combination of statistical and linguistic methods, they are inherently incomplete [15,4]. Rules over KGs are of the form head ← body , where head is a binary atom and body is a conjunction of, possibly negated, binary or unary atoms. When rules are automatically learned, statistical measures like support and confidence are used to assess the quality of rules. Most notably, the confidence of a rule is the fraction of facts predicted by the rule that are indeed true in the KG. However, this is a meaningful measure for rule quality only when the KG is reasonably complete. For rules learned from largely incomplete KGs, confidence and other measures may be misleading, as they do not reflect the patterns in the ? This poster is accompanying our conference paper [5]. missing facts. For example, a KG that knows only (or mostly) male CEOs would yield a heavily biased rule gender(X ,male)← isCEO(X ,Y ), isCompany(Y ), which does not extend to the entirety of valid facts beyond the KG. Therefore, it is crucial that rules can be ranked by a meaningful quality measure, which accounts for KG incompleteness. Example. Consider a KG about people’s jobs, residence and spouses as well as office locations and headquarters of companies. Suppose a rule learning method has computed the following two rules: r1 : livesIn(X ,Y )← worksFor(X ,Z ), hasOfficeIn(Z ,Y ) (1) r2 : livesIn(Y ,Z )← marriedTo(X ,Y ), livesIn(X ,Z ) (2) The rule r1 is quite noisy, as companies have offices in many cities, but employees live and work in only one of them, while the rule r2 clearly is of higher quality. However, depending on how the KG is populated with instances, the rule r1 could nevertheless score higher than r2 in terms of confidence measures. For example, the KG may contain only a specific subset of company offices and only people who work for specific companies. If we knew the complete KG, then the rule r2 should presumably be ranked higher than r1 . Suppose we had a perfect oracle for the true and complete KG. Then we could learn even more sophisticated rules such as r3 : livesIn(X ,Y )← worksFor(X ,Z ), hasHeadquarterIn(Z ,Y ),not locatedIn(Y ,USA). This rule would capture that most people work in the same city as their employers’ headquarters, with the USA being an exception (assuming that people there are used to long commutes). This is an example of a rule that contains a negated atom in the rule body (so it is no longer a Horn rule) and has a partially grounded atom with a variable and a constant as its arguments. Problem. The problem of KG incompleteness has been tackled by methods that (learn to) predict missing facts for KGs (or actually missing relational edges between existing entities). A prominent class of approaches is statistics-based and includes tensor factorization, e.g., [14] and neural-embedding-based models, e.g. [3,13]. Intuitively, these approaches turn a KG, possibly augmented with external sources such as text [23,25,18], into a probabilistic representation of its entities and relations, known as embeddings, and then predict the likelihood of missing facts by reasoning over the embeddings [21]. These kinds of embeddings can complement the given KG and are a potential asset in overcoming the limitations that arise from incomplete KGs. Consider the following gedankenexperiment: we compute embeddings from the KG and external text sources, that can then be used to predict the complete KG that comprises all valid facts. This would seemingly be the perfect starting point for learning rules, without the bias and quality problems of the incomplete KG. However, this scenario is way oversimplified. The embeddings-based fact predictions would themselves be very noisy, yielding also many spurious facts. Moreover, the computation of all fact predictions and the induction of all possible rules would come with a big scalability challenge: in practice, we need to restrict ourselves to computing merely small subsets of likely fact predictions and promising rule candidates. Our Approach. In this work we propose an approach for rule learning guided by external sources that allows to learn high-quality rules from incomplete KGs. In particular, our method extends rule learning by exploiting probabilistic representations of missing facts computed by embedding models of KGs and possibly other external information sources. More formally, let G be a KG, then a probabilistic KG P is a pair P = (G, f) where f is a probability function over the facts (or KGs edges), where we assume f(a) = 1 for each fact a ∈ G, which is already known to be true. Our proposal is to learn rules that not only describe the available graph G well, but also predict highly probable facts based on the function f . The key questions here are how to define the quality of a given rule r based on P and how to exploit it during rule learning for pruning out not promising rules. We define a quality measure μ for rules over probabilistic KGs as a function μ : (r,P) 7→ α, where α ∈ [0, 1]. To measure the quality μ of r over P we propose: – to measure the quality μ1 of r over G, where μ1 : (r,G) 7→ α ∈ [0, 1], – to measure the quality μ2 of Gr (i.e., G extended with edges derived using r) by relying on Pr = (Gr, f), where μ2: (G′, (G, f)) 7→ α∈ [0, 1] for G′ ⊇ G is the quality of extensions G′ of G over the signature of G given f , and – to combine the result as the weighted sum. That is, we define our hybrid rule quality function μ(r,P) as follows: μ(r,P) = (1− λ)× μ1(r,G) + λ× μ2(Gr,P) (3) In this formula μ1 can be any classical quality measure of rules over complete graphs. Intuitively, μ2(Gr,P) is the quality of Gr wrt f that allows us to capture the information about facts missing in G that are relevant for r. The weighting factor λ, we call it embedding weight, allows one to choose whether to rely more on the classical measure μ1 or on the measure μ2 of the quality of the extension Gr. We propose to realize this approach by iteratively constructing rules over a KG and by collecting feedback from a precomputed embedding model, through specific queries issued to the model for assessing the quality of (partially constructed) rule candidates. This way, the rule induction loop is interleaved with the guidance from the embeddings, and we avoid scalability problems. Our machinery is also more expressive than many prior works on rule learning from KGs, by allowing non-monotonic rules with negated atoms as well as partially grounded atoms. Within this framework, we devise confidence measures that capture rule quality better than previous techniques and thus improve the ranking of rules. Contribution. We propose a rule learning approach guided by external sources, and show how to learn high-quality rules by utilizing feedback from embedding models. We implement our approach and present extensive experiments on real-world KGs, demonstrating the effectiveness of our approach with respect to both the quality of the mined rules and predictions that they produce. Our code and data are made available to the research community at https://github.com/hovinhthinh/RuLES.

[1] Zbigniew W. Ras,et al. Action-Rules: How to Increase Profit of a Company , 2000, PKDD.

[2] Evgeny Kharlamov,et al. Rule Learning from Knowledge Graphs Guided by Embedding Models , 2018, SEMWEB.

[3] Dimitris Kanellopoulos,et al. Association Rules Mining: A Recent Overview , 2006 .

[4] Heiko Paulheim,et al. Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[5] Michel Manago,et al. CASSIOPÉE: Fehlerdiagnose von CFM 56-3 Triebwerken für Boing 737 Flugzeuge , 1996, Künstliche Intell..

[6] Minlie Huang,et al. SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions , 2016, AAAI.

[7] Zhendong Mao,et al. Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[9] Juan-Zi Li,et al. Text-Enhanced Representation Learning for Knowledge Graph , 2016, IJCAI.

[10] Heikki Mannila,et al. Discovering Frequent Episodes in Sequences , 1995, KDD.

[11] Fabian M. Suchanek,et al. Fast rule mining in ontological knowledge bases with AMIE+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $+$ \end{docu , 2015, The VLDB Journal.

[12] Das Amrita,et al. Mining Association Rules between Sets of Items in Large Databases , 2013 .

[13] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.

[14] Lorenzo Rosasco,et al. Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[15] Juan-Zi Li,et al. RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles , 2015, ArXiv.

[16] Peer Kröger,et al. Event-Enhanced Learning for KG Completion , 2018, ESWC.

[17] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[18] Thomas A. Runkler,et al. SemDia: Semantic Rule-Based Equipment Diagnostics Tool , 2017, CIKM.

[19] Zbigniew W. Ras,et al. Action rule discovery from incomplete data , 2010, Knowledge and Information Systems.

[20] Ian Horrocks,et al. Semantic Rules for Machine Diagnostics: Execution and Management , 2017, CIKM.

[21] Gregory Piatetsky-Shapiro,et al. Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[22] Hans-Peter Kriegel,et al. A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[23] Janusz Wojtusiak,et al. Rule Learning in Healthcare and Health Services Research , 2019, Machine Learning in Healthcare Informatics.

[24] Thomas A. Runkler,et al. Semantic Rule-Based Equipment Diagnostics , 2017, SEMWEB.

[25] Thomas A. Runkler,et al. Semantic Rule-Based Equipment Diagnostic , 2017, International Semantic Web Conference.