Identifying Influential Users' Professions via the Microblogs They Forward

For most social media sites, how to find out (influential) users’ professions is an important task. Much work has been conducted to explore this task through mining user-generated textual content or analyzing the social network structure. In this paper, we innovatively solve this task by only examining which microblog messages an influential user has forwarded. First, we define hot microblog messages under two standards and identify them from a large number of candidate messages. Each of the identified messages points to a specific hot event. Next, we group similar hot messages together based on their word similarity, semantic similarity, and forwarders’ similarity. Last, we represent users with the hot messages they forwarded and design an identification method to identify their professions. Moreover, we collect a real-world dataset to conduct experiments and prove that our method performs significantly better than the traditional method.

[1]  Aron Culotta,et al.  Predicting the Demographics of Twitter Users from Website Traffic Data , 2015, AAAI.

[2]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[3]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[4]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[5]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[8]  Ingmar Weber,et al.  You Are What Apps You Use: Demographic Prediction Based on User's Apps , 2016, ICWSM.

[9]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[10]  Yang Xiao,et al.  Improving Users’ Demographic Prediction via the Videos They Talk about , 2016, EMNLP.

[11]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[12]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[13]  M. Kosinski,et al.  Computer-based personality judgments are more accurate than those made by humans , 2015, Proceedings of the National Academy of Sciences.

[14]  Nicholas Jing Yuan,et al.  You Are Where You Go: Inferring Demographic Attributes from Location Check-ins , 2015, WSDM.

[15]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[16]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.