Scalable learning of users' preferences using networked data

Users' personal information such as their political views is important for many applications such as targeted advertisements or real-time monitoring of political opinions. Huge amounts of data generated by social media users present opportunities and challenges to study these preferences in a large scale. In this paper, we aim to infer social media users' political views when only network information is available. In particular, given personal preferences about some of the social media users, how can we infer the preferences of unobserved individuals in the same network? There are many existing solutions that address the problem of classification with networked data problem. However, networks in social media normally involve millions and even hundreds of millions of nodes, which make the scalability an important problem in inferring personal preferences in social media. To address the scalability issue, we use social influence theory to construct new features based on a combination of local and global structures of the network. Then we use these features to train classifiers and predict users' preferences. Due to the size of real-world social networks, using the entire network information is inefficient and not practical in many cases. By extracting local social dimensions, we present an efficient and scalable solution. Further, by capturing the network's global pattern, the proposed solution, balances the performance requirement between accuracy and efficiency.

[1]  Koen W. De Bock,et al.  Predicting Website Audience Demographics forWeb Advertising Targeting Using Multi-Website Clickstream Data , 2010, Fundam. Informaticae.

[2]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[3]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[4]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[5]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[6]  L. L. Thurstone,et al.  The Measurement of Attitudes. , 1950 .

[7]  Reza Zafarani,et al.  Am i more similar to my followers or followees?: analyzing homophily effect in directed social networks , 2014, HT.

[8]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[9]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[10]  S. Gosling,et al.  A room with a cue: personality judgments based on offices and bedrooms. , 2002, Journal of personality and social psychology.

[11]  David D. Jensen Statistical challenges to inductive inference in linked data , 1999, AISTATS.

[12]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[13]  Yun Yang,et al.  User interest and social influence based emotion prediction for individuals , 2013, ACM Multimedia.

[14]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[15]  George Karypis,et al.  Within-Network Classification Using Local Structure Similarity , 2009, ECML/PKDD.

[16]  Helen C. Shen,et al.  Semi-Supervised Classification Using Linear Neighborhood Propagation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Behram F. T. Mistree,et al.  Gaydar: Facebook Friendships Expose Sexual Orientation , 2009, First Monday.

[18]  Mohammad Ali Abbasi,et al.  Real-World Behavior Analysis through a Social Media Lens , 2012, SBP.

[19]  Mohammad Ali Abbasi,et al.  TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief , 2011, ICWSM.

[20]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[21]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Huan Liu,et al.  Twitter Data Analytics , 2013, SpringerBriefs in Computer Science.

[24]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[25]  Mohamed Ali Kâafar,et al.  You are what you like! Information leakage through users' Interests , 2012, NDSS.

[26]  Dan Murray,et al.  Inferring Demographic Attributes of Anonymus Internet Users , 1999, WEBKDD.

[27]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[28]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[29]  Mohammad Ali Abbasi,et al.  Measuring User Credibility in Social Media , 2013, SBP.

[30]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[31]  Reza Zafarani,et al.  Am I More Similar to My Followers or Followees ? Homophily Effect in Directed Online Social Networks , 2014 .

[32]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[33]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2008, IEEE Trans. Knowl. Data Eng..

[34]  Huan Liu,et al.  Mining social media with social theories: a survey , 2014, SKDD.

[35]  D. Rao Detecting Latent User Properties in Social Media , 2010 .

[36]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[37]  Bernd Marcus,et al.  Personality in cyberspace: personal Web sites as media for personality expressions and impressions. , 2006, Journal of personality and social psychology.

[38]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[39]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.