Meta-Path-Based Search and Mining in Heterogeneous Information Networks

Information networks that can be extracted from many domains are widely studied recently. Different functions for mining these networks are proposed and developed, such as ranking, community detection, and link prediction. Most existing network studies are on homogeneous networks, where nodes and links are assumed from one single type. In reality, however, heterogeneous information networks can better model the real-world systems, which are typically semi-structured and typed, following a network schema. In order to mine these heterogeneous information networks directly, we propose to explore the meta structure of the information network, i.e., the network schema. The concepts of meta-paths are proposed to systematically capture numerous semantic relationships across multiple types of objects, which are defined as a path over the graph of network schema. Meta-paths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network. Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure. Moreover, with user's guidance or feedback, we can select the best meta-path or their weighted combination for a specific mining task.

[1]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[2]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[3]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[4]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[5]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[6]  E. Rogers,et al.  Diffusion of Innovations, 5th Edition , 2003 .

[7]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[8]  李丽,et al.  《Tsinghua Science and Technology》网上国际审稿 , 2002 .

[9]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[10]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[11]  Yizhou Sun,et al.  User guided entity similarity search using meta-path selection in heterogeneous information networks , 2012, CIKM.

[12]  Philip S. Yu,et al.  Integrating meta-path selection with user-guided object clustering in heterogeneous information networks , 2012, KDD.

[13]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[14]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[15]  Annette J. Dobson,et al.  An introduction to generalized linear models , 1991 .

[16]  Jiawei Han,et al.  Modeling and exploiting heterogeneous bibliographic networks for expertise ranking , 2012, JCDL '12.

[17]  E. Rogers,et al.  Diffusion of innovations , 1964, Encyclopedia of Sport Management.

[18]  Philip S. Yu,et al.  Relevance search in heterogeneous networks , 2012, EDBT '12.

[19]  Yizhou Sun,et al.  Query-driven discovery of semantically similar substructures in heterogeneous networks , 2012, KDD.

[20]  Yizhou Sun,et al.  Recommendation in heterogeneous information networks with implicit user feedback , 2013, RecSys.

[21]  A. Dobson An Introduction to Generalized Linear Models, Second Edition , 2001 .

[22]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[23]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.