Significant edge detection in target network by exploring multiple auxiliary networks

Despite the ability to model many real world settings as a network, one major challenge in analyzing network data is that important and reliable links between objects are usually obscured by noisy information and hence not readily discernible. In this paper, we propose to detect these important and reliable links - significant edges, from a target network by using multiple auxiliary networks and a limited amount of labelled information. In this process, we first abstract the community knowledge learnt across target and auxiliary networks to detect significant patterns. The mined community knowledge captures the key profile of network relationships and thus can be used to determine whether an existing edge indicates a true or false relationship. Experiments on real world network data show that our two staged solution - a joint matrix factorisation procedure followed by edge significance score ranking, accurately predicts significant edges in target network by jointly exploring the underlying knowledge embedded in both target and auxiliary networks.

[1]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[2]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[5]  Derek Greene,et al.  A Matrix Factorization Approach for Integrating Multiple Data Views , 2009, ECML/PKDD.

[6]  Panagiotis Symeonidis,et al.  Transitive node similarity for link prediction in social networks with positive and negative links , 2010, RecSys '10.

[7]  Alex Pentland,et al.  Sensing the "Health State" of a Community , 2012, IEEE Pervasive Computing.

[8]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[9]  Huan Liu,et al.  Exploiting homophily effect for trust prediction , 2013, WSDM.

[10]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[12]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[13]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[14]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[15]  Bin Wu,et al.  Predicting missing links via local feature of common neighbors , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[16]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[17]  Tao Zhou,et al.  Scale-free networks without growth , 2008 .

[18]  Matthew E. Brashears,et al.  Small networks and high isolation? A reexamination of American discussion networks , 2011, Soc. Networks.

[19]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[20]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  Wei Tang,et al.  Supervised Link Prediction Using Multiple Sources , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Ben Taskar,et al.  Multi-View Learning over Structured and Non-Identical Outputs , 2008, UAI.

[23]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Trends in genetics : TIG.

[24]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[25]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[27]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[28]  Thomas Hofmann,et al.  Stochastic Relational Models for Discriminative Link Prediction , 2007 .

[29]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .