PRISM

Profession is an important social attribute of people. It plays a crucial role in commercial services such as personalized recommendation and targeted advertising. In practice, profession information is usually unavailable due to privacy and other reasons. In this article, we explore the task of identifying user professions according to their behaviors in social media. The task confronts the following challenges that make it non-trivial: how to incorporate heterogeneous information of user behaviors, how to effectively utilize both labeled and unlabeled data, and how to exploit community structure. To address these challenges, we present a framework called Profession Identification in Social Media. It takes advantage of both personal information and community structure of users in the following aspects: (1) We present a cascaded two-level classifier with heterogeneous personal features to measure the confidence of users belonging to different professions. (2) We present a multi-training process to take advantages of both labeled and unlabeled data to enhance classification performance. (3) We design a profession identification method synthetically considering the confidences from personal features and community structure. We collect a real-world dataset to conduct experiments, and experimental results demonstrate the significant effectiveness of our method compared with other baseline methods. By applying prediction on large-scale users, we also analyze characteristics of microblog users, finding that there are significant diversities among users of different professions in demographics, social network structures, and linguistic styles.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[4]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[5]  Clayton Fink,et al.  Inferring Gender from the Content of Tweets: A Region Specific Example , 2012, ICWSM.

[6]  Rudi Volti An Introduction to the Sociology of Work and Occupations , 2007 .

[7]  Marilyn A. Walker,et al.  Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text , 2007, J. Artif. Intell. Res..

[8]  Zhiyuan Liu,et al.  Inferring Correspondences from Multiple Sources for Microblog User Tags , 2014, SMP.

[9]  Zhiyuan Liu,et al.  Community-enhanced Network Representation Learning for Network Analysis , 2016, ArXiv.

[10]  Philip S. Yu,et al.  Multi-label classification by mining label and instance correlations from heterogeneous information networks , 2013, KDD.

[11]  J. Pennebaker,et al.  Linguistic Style Matching in Social Interaction , 2002 .

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  Alexander J. Smola,et al.  Like like alike: joint friendship and interest propagation in social networks , 2011, WWW.

[14]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[15]  Jianyong Wang,et al.  Incorporating heterogeneous information for personalized tag recommendation in social tagging systems , 2012, KDD.

[16]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[17]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[18]  Kevin Lewis,et al.  Social selection and peer influence in an online social network , 2011, Proceedings of the National Academy of Sciences.

[19]  Sunita Sarawagi,et al.  A few good predictions: selective node labeling in a social network , 2014, WSDM.

[20]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[21]  Sudeshna Sarkar,et al.  Stylometric Analysis of Bloggers' Age and Gender , 2009, ICWSM.

[22]  Jon M. Kleinberg,et al.  Echoes of power: language effects and power differences in social interaction , 2011, WWW.

[23]  Sotiris Ioannidis,et al.  we.b: the web of short urls , 2011, WWW.

[24]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[25]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[26]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[27]  Ludovic Denoyer,et al.  Learning latent representations of nodes for classifying in heterogeneous social networks , 2014, WSDM.

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Robert A. Rothman Working: Sociological Perspectives , 1986 .

[30]  Zhiyuan Liu,et al.  Tag Dispatch Model with Social Network Regularization for Microblog User Tag Suggestion , 2012, COLING.

[31]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[33]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[34]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[35]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[36]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[37]  J. L. Holland,et al.  Making vocational choices : a theory of vocational personalities and work environments , 1984 .

[38]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[39]  Eric P. Xing,et al.  Spatial compactness meets topical consistency: jointly modeling links and content for community detection , 2014, WSDM.

[40]  J. Chafetz,et al.  The Gender Division of Labor and the Reproduction of Female Disadvantage , 1988, Journal of family issues.

[41]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[42]  Jennifer Golbeck,et al.  Predicting personality with social media , 2011, CHI Extended Abstracts.