Improving Users’ Demographic Prediction via the Videos They Talk about

In this paper, we improve microblog users’ demographic prediction by fully utilizing their video related behaviors. First, we collect the describing words of currently popular videos, including video names, actor names and video keywords, from video websites. Secondly, we search these describing words in users’ microblogs, and build the direct relationships between users and the appeared words. After that, to make the sparse relationship denser, we propose a Bayesian method to calculate the probability of connections between users and other video describing words. Lastly, we build two models to predict users’ demographics with the obtained direct and indirect relationships. Based on a large realworld dataset, experiment results show that our method can significantly improve these words’ demographic predictive ability.

[1]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[2]  Ingmar Weber,et al.  Who uses web search for what: and how , 2011, WSDM '11.

[3]  Benno Stein,et al.  Overview of the 3rd Author Profiling Task at PAN 2015 , 2015, CLEF.

[4]  M. Kosinski,et al.  Computer-based personality judgments are more accurate than those made by humans , 2015, Proceedings of the National Academy of Sciences.

[5]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[6]  Jon Oberlander,et al.  What Are They Blogging About? Personality, Topic and Motivation in Blogs , 2009, ICWSM.

[7]  Clifton B. Kruse Jr. Esq. How Old Do You Think I Am , 2001 .

[8]  Dirk Hovy,et al.  Demographic Factors Improve Classification Performance , 2015, ACL.

[9]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[10]  Aron Culotta,et al.  Predicting the Demographics of Twitter Users from Website Traffic Data , 2015, AAAI.

[11]  Dan Murray,et al.  Inferring Demographic Attributes of Anonymus Internet Users , 1999, WEBKDD.

[12]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[13]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[14]  Nicholas Jing Yuan,et al.  You Are Where You Go: Inferring Demographic Attributes from Location Check-ins , 2015, WSDM.

[15]  Jahna Otterbacher,et al.  Inferring gender of movie reviewers: exploiting writing style, content and metadata , 2010, CIKM.

[16]  Sharad Goel,et al.  Who Does What on the Web: A Large-Scale Study of Browsing Behavior , 2012, ICWSM.

[17]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[18]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[19]  Milad Shokouhi,et al.  Inferring the demographics of search users: social data meets search queries , 2013, WWW.

[20]  Eduard H. Hovy,et al.  Weakly Supervised User Profile Extraction from Twitter , 2014, ACL.

[21]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[23]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[24]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[25]  Zhiyuan Liu,et al.  PRISM , 2017, ACM Trans. Intell. Syst. Technol..

[26]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[27]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[28]  Yoram Bachrach,et al.  Personality and Website Choice , 2012 .

[29]  Philip S. Yu,et al.  Inferring social roles and statuses in social networks , 2013, KDD.

[30]  Walter Daelemans,et al.  Using syntactic features to predict author personality from text , 2008 .

[31]  Ingmar Weber,et al.  You Are What Apps You Use: Demographic Prediction Based on User's Apps , 2016, ICWSM.

[32]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[33]  Jon Oberlander,et al.  Identifying more bloggers: Towards large scale personality classification of personal weblogs , 2007, ICWSM.

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[36]  Jon Oberlander,et al.  Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text , 2006, ACL.

[37]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[38]  Venkata Rama Kiran Garimella,et al.  Who watches (and shares) what on youtube? and when?: using twitter to understand youtube viewership , 2013, WSDM.