Relationship Emergence Prediction in Heterogeneous Networks through Dynamic Frequent Subgraph Mining

With the rapid development of Web 2.0 and the Internet of things, predicting relationships in heterogeneous networks has evolved as a heated research topic. Traditionally, people analyze existing relationships in heterogeneous networks that relate in a particular way to a target relationship of interest to predict the emergence of the target relationship. However most existing methods are incapable of systematically identifying relevant relationships useful for the prediction task, especially those relationships involving multiple objects of heterogeneous types, which may not rest on a simple path in the concerned heterogeneous network. Another problem with the current practice is that the existing solutions often ignore the dynamic evolution of the network structure after the introduction of newly emerged relationships. To overcome the first limitation, we propose a new algorithm that can systematically and comprehensively detect relevant relationships useful for the prediction of an arbitrarily given target relationship through a disciplined graph searching process. To address the second limitation, the new algorithm leverages a series of temporally-sensitive features for the relationship occurrence prediction via a supervised learning approach. To explore the effectiveness of the new algorithm, we apply the prototype implementation of the algorithm on the DBLP bibliographic network to predict the author citation relationships and compare the algorithm performance with that of a state-of-the-art peer method and a series of baseline methods. The comparison shows consistently higher prediction accuracy under a range of prediction scenarios.

[1]  Gerald Appel,et al.  Technical Analysis: Power Tools for Active Investors , 2005 .

[2]  F. Rahel Homogenization of fish faunas across the United States. , 2000, Science.

[3]  Aristides Gionis,et al.  Learning and Predicting the Evolution of Social Networks , 2010, IEEE Intelligent Systems.

[4]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[5]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[6]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[7]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[8]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[9]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[10]  Steven I-Jy Chien,et al.  Dynamic Freeway Travel-Time Prediction with Probe Vehicle Data: Link Based Versus Path Based , 2001 .

[11]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[12]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[13]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[14]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Lyle H. Ungar,et al.  Statistical Relational Learning for Link Prediction , 2003 .

[16]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[17]  James G. Wiener,et al.  The Weibull Distribution: A New Method of Summarizing Survivorship Data , 1978 .

[18]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[19]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[20]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .