RelSim: Relation Similarity Search in Schema-Rich Heterogeneous Information Networks

Recent studies have demonstrated the power of modeling real world data as heterogeneous information networks (HINs) consisting of multiple types of entities and relations. Unfortunately, most of such studies (e.g., similarity search) confine discussions on the networks with only a few entity and relationship types, such as DBLP. In the real world, however, the network schema can be rather complex, such as Freebase. In such HINs with rich schema, it is often too much burden to ask users to provide explicit guidance in selecting relations for similarity search. In this paper, we study the problem of relation similarity search in schema-rich HINs. Under our problem setting, users are only asked to provide some simple relation instance examples (e.g., 〈Barack Obama, John Kerry〉 and 〈George W. Bush, Condoleezza Rice〉) as a query, and we automatically detect the latent semantic relation (LSR) implied by the query (e.g., “president vs. secretary-ofstate”). Such LSR will help to find other similar relation instances (e.g., 〈Bill Clinton, Madeleine Albright〉). In order to solve the problem, we first define a new meta-path-based relation similarity measure, RelSim, to measure the similarity between relation instances in schema-rich HINs. Then given a query, we propose an optimization model to efficiently learn LSR implied in the query through linear programming, and perform fast relation similarity search using RelSim based on the learned LSR. The experiments on real world datasets derived from Freebase demonstrate the effectiveness and efficiency of our approach.

[1]  Heng Ji,et al.  Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations , 2015, IJCAI.

[2]  Ming Zhou,et al.  Paraphrasing Adaptation for Web Search Ranking , 2013, ACL.

[3]  Dan Roth,et al.  Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks , 2015, KDD.

[4]  Ken-ichi Kawarabayashi,et al.  Mining for Analogous Tuples from an Entity-Relation Graph , 2013, IJCAI.

[5]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[6]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[8]  Jiawei Han,et al.  KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[10]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[11]  Peter D. Turney Measuring Semantic Similarity by Latent Relational Analysis , 2005, IJCAI.

[12]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[13]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[14]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[15]  Danushka Bollegala,et al.  Measuring the similarity between implicit semantic relations from the web , 2009, WWW '09.

[16]  Jiawei Han,et al.  Text Classification with Heterogeneous Information Network Kernels , 2016, AAAI.

[17]  Ramez Elmasri,et al.  GQBE: Querying knowledge graphs by example entity tuples , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  Yinghui Wu,et al.  Ontology-based subgraph querying , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[21]  Xuemin Lin,et al.  Top-k Set Similarity Joins , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[22]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[23]  L. Bush,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015 .

[24]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[25]  Ying Chen,et al.  Cross Domain Random Walk for Query Intent Pattern Mining from Search Engine Log , 2011, 2011 IEEE 11th International Conference on Data Mining.

[26]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[27]  Yizhou Sun,et al.  Query-driven discovery of semantically similar substructures in heterogeneous networks , 2012, KDD.

[28]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.