Semantic proximity search on graphs with metagraph-based learning

Given ubiquitous graph data such as the Web and social networks, proximity search on graphs has been an active research topic. The task boils down to measuring the proximity between two nodes on a graph. Although most earlier studies deal with homogeneous or bipartite graphs only, many real-world graphs are heterogeneous with objects of various types, giving rise to different semantic classes of proximity. For instance, on a social network two users can be close for different reasons, such as being classmates or family members, which represent two distinct classes of proximity. Thus, it becomes inadequate to only measure a “generic” form of proximity as previous works have focused on. In this paper, we identify metagraphs as a novel and effective means to characterize the common structures for a desired class of proximity. Subsequently, we propose a family of metagraph-based proximity, and employ a supervised technique to automatically learn the right form of proximity within its family to suit the desired class. As it is expensive to match (i.e., find the instances of) a metagraph, we propose the novel approaches of dual-stage training and symmetry-based matching to speed up. Finally, our experiments reveal that our approach is significantly more accurate and efficient. For accuracy, we outperform the baselines by 11% and 16% in NDCG and MAP, respectively. For efficiency, dual-stage training reduces the overall matching cost by 83%, and symmetry-based matching further decreases the cost of individual metagraphs by 52%.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[3]  Kevin Chen-Chuan Chang,et al.  Towards rich query interpretation: walking back and forth for mining query templates , 2010, WWW '10.

[4]  Xiaokui Xiao,et al.  Large-scale frequent subgraph mining in MapReduce , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Marcel J. T. Reinders,et al.  Efficient calculation of compound similarity based on maximum common subgraphs and its application to prediction of gene transcript levels , 2013, Int. J. Bioinform. Res. Appl..

[6]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[7]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[8]  Soumen Chakrabarti,et al.  Learning Parameters in Entity Relationship Graphs from Ranking Preferences , 2006, PKDD.

[9]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[10]  Kevin Chen-Chuan Chang,et al.  User profiling in an ego network: co-profiling attributes and relationships , 2014, WWW.

[11]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[12]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[13]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[14]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[15]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[16]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[17]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[18]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[19]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[20]  Shipeng Yu,et al.  Designing efficient cascaded classifiers: tradeoff between accuracy and cost , 2010, KDD.

[21]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[22]  Kilian Q. Weinberger,et al.  The Greedy Miser: Learning under Test-time Budgets , 2012, ICML.

[23]  Kevin Chen-Chuan Chang,et al.  Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality , 2011, WSDM '11.

[24]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[25]  Berkant Barla Cambazoglu,et al.  Early exit optimizations for additive machine learned ranking systems , 2010, WSDM '10.

[26]  Kevin Chen-Chuan Chang,et al.  RoundTripRank: Graph-based proximity with importance and specificity? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[27]  Sourav S. Bhowmick,et al.  Efficient algorithms for generalized subgraph query processing , 2012, CIKM '12.

[28]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[29]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[30]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[31]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[32]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[33]  Ben Taskar,et al.  Learning Adaptive Value of Information for Structured Prediction , 2013, NIPS.