Unsupervised link prediction using aggregative statistics on heterogeneous social networks

The concern of privacy has become an important issue for online social networks. In services such as Foursquare.com, whether a person likes an article is considered private and therefore not disclosed; only the aggregative statistics of articles (i.e., how many people like this article) is revealed. This paper tries to answer a question: can we predict the opinion holder in a heterogeneous social network without any labeled data? This question can be generalized to a link prediction with aggregative statistics problem. This paper devises a novel unsupervised framework to solve this problem, including two main components: (1) a three-layer factor graph model and three types of potential functions; (2) a ranked-margin learning and inference algorithm. Finally, we evaluate our method on four diverse prediction scenarios using four datasets: preference (Foursquare), repost (Twitter), response (Plurk), and citation (DBLP). We further exploit nine unsupervised models to solve this problem as baselines. Our approach not only wins out in all scenarios, but on the average achieves 9.90% AUC and 12.59% NDCG improvement over the best competitors. The resources are available at http://www.csie.ntu.edu.tw/~d97944007/aggregative/

[1]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[3]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[4]  Nitesh V. Chawla,et al.  Link Prediction and Recommendation across Heterogeneous Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Mao Ye,et al.  Exploiting geographical influence for collaborative point-of-interest recommendation , 2011, SIGIR.

[6]  G. Jantzen 1988 , 1988, The Winning Cars of the Indianapolis 500.

[7]  Nitesh V. Chawla,et al.  Link Prediction: Fair and Effective Evaluation , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[8]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[9]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[10]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[11]  Jie Tang,et al.  Who will follow you back?: reciprocal relationship prediction , 2011, CIKM '11.

[12]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[13]  Miles Osborne,et al.  RT to Win! Predicting Message Propagation in Twitter , 2011, ICWSM.

[14]  Mi-Yen Yeh,et al.  Influential Nodes in a One-Wave Diffusion Model for Location-Based Social Networks , 2013, PAKDD.

[15]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[16]  Shu-Kai Hsieh,et al.  Classifying mood in plurks , 2010, ROCLING/IJCLCLP.

[17]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[18]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[19]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[20]  Shou-De Lin,et al.  Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks , 2012, ACL.

[21]  Francesco Bonchi,et al.  Cold start link prediction , 2010, KDD.

[22]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Brian D. Davison,et al.  Predicting popular messages in Twitter , 2011, WWW.

[25]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[26]  Jie Tang,et al.  Inferring social ties across heterogenous networks , 2012, WSDM '12.

[27]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[28]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[29]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[30]  Nitesh V. Chawla,et al.  Link Prediction in Heterogeneous Networks : Influence and Time Matters , 2012 .

[31]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[32]  Nitesh V. Chawla,et al.  Multi-relational Link Prediction in Heterogeneous Information Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[33]  Shou-De Lin,et al.  Assessing the Quality of Diffusion Models Using Real-World Social Network Data , 2011, 2011 International Conference on Technologies and Applications of Artificial Intelligence.

[34]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[35]  Wolfgang Kellerer,et al.  Outtweeting the Twitterers - Predicting Information Cascades in Microblogs , 2010, WOSN.

[36]  Shou-De Lin,et al.  Learning-based concept-hierarchy refinement through exploiting topology, content and social information , 2011, Inf. Sci..