Predicting Friendship Links in Social Networks Using a Topic Modeling Approach

In the recent years, the number of social network users has increased dramatically. The resulting amount of data associated with users of social networks has created great opportunities for data mining problems. One data mining problem of interest for social networks is the friendship link prediction problem. Intuitively, a friendship link between two users can be predicted based on their common friends and interests. However, using user interests directly can be challenging, given the large number of possible interests. In the past, approaches that make use of an explicit user interest ontology have been proposed to tackle this problem, but the construction of the ontology proved to be computationally expensive and the resulting ontology was not very useful. As an alternative, we propose a topic modeling approach to the problem of predicting new friendships based on interests and existing friendships. Specifically, we use Latent Dirichlet Allocation (LDA) to model user interests and, thus, we create an implicit interest ontology. We construct features for the link prediction problem based on the resulting topic distributions. Experimental results on several LiveJournal data sets of varying sizes show the usefulness of the LDA features for predicting friendships.

[1]  Shruti Phanse,et al.  Study on the performance of ontology based approaches to link prediction in social networks as the number of users increases , 2010 .

[2]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[3]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[4]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Doina Caragea,et al.  Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications , 2009, OTM Conferences.

[7]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[8]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[9]  Doina Caragea,et al.  Ontology-Based Link Prediction in the LiveJournal Social Network , 2009, SARA.

[10]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[11]  Edward Y. Chang,et al.  Collaborative filtering for orkut communities: discovery of user latent behavior , 2009, WWW '09.

[12]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[13]  Tim Weninger,et al.  Structural Link Analysis from User Profiles and Friends Networks: A Feature Construction Approach , 2007, ICWSM.

[14]  Jin-Cheon Na,et al.  Effectiveness of web search results for genre and sentiment classification , 2009, J. Inf. Sci..

[15]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[18]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.