PRISM: Profession Identification in Social Media with Personal Information and Community Structure

User profession plays an important role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this paper, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges which make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework of PRofession Identification in Social Media (PRISM). It takes advantages of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidences of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate significant effectiveness of our method compared with other baseline methods.

[1]  Jianyong Wang,et al.  Incorporating heterogeneous information for personalized tag recommendation in social tagging systems , 2012, KDD.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Zhiyuan Liu,et al.  Tag Dispatch Model with Social Network Regularization for Microblog User Tag Suggestion , 2012, COLING.

[4]  Rudi Volti An Introduction to the Sociology of Work and Occupations , 2007 .

[5]  Clayton Fink,et al.  Inferring Gender from the Content of Tweets: A Region Specific Example , 2012, ICWSM.

[6]  Alexander J. Smola,et al.  Like like alike: joint friendship and interest propagation in social networks , 2011, WWW.

[7]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[8]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[9]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[10]  Sudeshna Sarkar,et al.  Stylometric Analysis of Bloggers' Age and Gender , 2009, ICWSM.

[11]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[12]  Sotiris Ioannidis,et al.  we.b: the web of short urls , 2011, WWW.

[13]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[14]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[15]  Jon M. Kleinberg,et al.  Echoes of power: language effects and power differences in social interaction , 2011, WWW.

[16]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[17]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[18]  Ludovic Denoyer,et al.  Learning latent representations of nodes for classifying in heterogeneous social networks , 2014, WSDM.

[19]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[20]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[21]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[22]  Eric P. Xing,et al.  Spatial compactness meets topical consistency: jointly modeling links and content for community detection , 2014, WSDM.

[23]  Jennifer Golbeck,et al.  Predicting personality with social media , 2011, CHI Extended Abstracts.

[24]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[25]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[26]  Sunita Sarawagi,et al.  A few good predictions: selective node labeling in a social network , 2014, WSDM.

[27]  Philip S. Yu,et al.  Multi-label classification by mining label and instance correlations from heterogeneous information networks , 2013, KDD.