Vertex collocation profiles: theory, computation, and results

We describe the vertex collocation profile (VCP) concept. VCPs provide rich information about the surrounding local structure of embedded vertex pairs. VCP analysis offers a new tool for researchers and domain experts to understand the underlying growth mechanisms in their networks and to analyze link formation mechanisms in the appropriate sociological, biological, physical, or other context. The same resolution that gives the VCP method its analytical power also enables it to perform well when used to accomplish link prediction. We first develop the theory, mathematics, and algorithms underlying VCPs. We provide timing results to demonstrate that the algorithms scale well even for large networks. Then we demonstrate VCP methods performing link prediction competitively with unsupervised and supervised methods across different network families. Unlike many analytical tools, VCPs inherently generalize to multirelational data, which provides them with unique power in complex modeling tasks. To demonstrate this, we apply the VCP method to longitudinal networks by encoding temporally resolved information into different relations. In this way, the transitions between VCP elements represent temporal evolutionary patterns in the longitudinal network data. Results show that VCPs can use this additional data, typically challenging to employ, to improve predictive model accuracies. We conclude with our perspectives on the VCP method and its future in network science, particularly link prediction.

[1]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[2]  Nitesh V. Chawla,et al.  Multi-relational Link Prediction in Heterogeneous Information Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[3]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[4]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[5]  David D. Jensen,et al.  The case for anomalous link discovery , 2005, SKDD.

[6]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[7]  Jie Tang,et al.  Link Prediction of Social Networks Based on Weighted Proximity Measures , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[8]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[9]  Tamara G. Kolda,et al.  Link Prediction on Evolving Data Using Matrix and Tensor Factorizations , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[10]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[11]  Luca Becchetti,et al.  Efficient algorithms for large-scale local triangle counting , 2010, TKDD.

[12]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Jennifer Neville,et al.  Temporal-Relational Classifiers for Prediction in Evolving Domains , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[15]  Katarzyna Musial,et al.  Link Prediction Based on Subgraph Evolution in Dynamic Social Networks , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[16]  Srinivasan Parthasarathy,et al.  Local Probabilistic Models for Link Prediction , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[18]  Padhraic Smyth,et al.  Prediction and ranking algorithms for event-based network data , 2005, SKDD.

[19]  Katarzyna Musial,et al.  The Dynamic Structural Patterns of Social Networks Based on Triad Transitions , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[20]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[21]  John Yen,et al.  Evolution of Node Behavior in Link Prediction , 2011, AAAI.

[22]  Chris Volinsky,et al.  Building an Effective Representation for Dynamic Networks , 2005 .

[23]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[25]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[26]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[27]  Donald L. Kreher,et al.  Combinatorial algorithms: generation, enumeration, and search , 1998, SIGA.

[28]  Kathleen M. Carley,et al.  k-Centralities: local approximations of global measures based on shortest paths , 2012, WWW.

[29]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[31]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.

[32]  Nitesh V. Chawla,et al.  Vertex collocation profiles: subgraph counting for link analysis and prediction , 2012, WWW.

[33]  Nitesh V. Chawla,et al.  LPmade: Link Prediction Made Easy , 2011, J. Mach. Learn. Res..

[34]  William J. Reed,et al.  The Double Pareto-Lognormal Distribution—A New Parametric Model for Size Distributions , 2004, WWW 2001.

[35]  Tin Wee Tan,et al.  In silico grouping of peptide/HLA class I complexes using structural interaction characteristics , 2007, Bioinform..

[36]  Cecilia Mascolo,et al.  Exploiting place features in link prediction on location-based social networks , 2011, KDD.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[39]  Darcy A. Davis,et al.  Exploring and Exploiting Disease Interactions from Multi-Relational Gene and Phenotype Networks , 2011, PloS one.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  David Carmel,et al.  Trend detection through temporal link analysis , 2004, J. Assoc. Inf. Sci. Technol..

[42]  Sergi Valverde,et al.  Topology and evolution of technology innovation networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[45]  Nitesh V. Chawla,et al.  Predictors of short-term decay of cell phone contacts in a large scale communication network , 2011, Soc. Networks.

[46]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..