User Classification with Multiple Textual Perspectives

Textual information is of critical importance for automatic user classification in social media. However, most previous studies model textual features in a single perspective while the text in a user homepage typically possesses different styles of text, such as original message and comment from others. In this paper, we propose a novel approach, namely ensemble LSTM, to user classification by incorporating multiple textual perspectives. Specifically, our approach first learns a LSTM representation with a LSTM recurrent neural network and then presents a joint learning method to integrating all naturally-divided textual perspectives. Empirical studies on two basic user classification tasks, i.e., gender classification and age classification, demonstrate the effectiveness of the proposed approach to user classification with multiple textual perspectives.

[1]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[2]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[3]  Ingemar J. Cox,et al.  Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language , 2016, ECIR.

[4]  Nan Liu A New Method for Micro-blog Platform Users Classification Based on Infinitesimal-time , 2013 .

[5]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[6]  Juan José Rodríguez Diez,et al.  A weighted voting framework for classifiers ensembles , 2012, Knowledge and Information Systems.

[7]  Carolyn Penstein Rosé,et al.  Modeling of Stylistic Variation in Social Media with Stretchy Patterns , 2011 .

[8]  Guodong Zhou,et al.  Interactive Gender Inference with Integer Linear Programming , 2015, IJCAI.

[9]  Saif Mohammad,et al.  Tracking Sentiment in Mail: How Genders Differ on Emotional Axes , 2011, WASSA@ACL.

[10]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[11]  Philip S. Yu,et al.  Language independent gender classification on Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[12]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[13]  Xuanjing Huang,et al.  Long Short-Term Memory Neural Networks for Chinese Word Segmentation , 2015, EMNLP.

[14]  John D. Burger,et al.  An Exploration of Observable Features Related to Blogger Age , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[15]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[16]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[17]  Marie-Francine Moens,et al.  Age and Gender Identification in Social Media , 2014, CLEF.

[18]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[19]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[20]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[21]  Guodong Zhou,et al.  Interactive Gender Inference in Social Media , 2015, DASFAA Workshops.

[22]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[23]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[24]  Guodong Zhou,et al.  Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training , 2015, DASFAA Workshops.

[25]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[26]  Yoram Bachrach,et al.  Studying User Income through Language, Behaviour and Affect in Social Media , 2015, PloS one.

[27]  Carolin Strobl,et al.  Unbiased split selection for classification trees based on the Gini Index , 2007, Comput. Stat. Data Anal..

[28]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[29]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Maarten Sap,et al.  Developing Age and Gender Predictive Lexica over Social Media , 2014, EMNLP.

[32]  R. Marler,et al.  The weighted sum method for multi-objective optimization: new insights , 2010 .

[33]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[34]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[35]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[36]  Katja Filippova,et al.  User Demographics and Language in an Implicit Social Network , 2012, EMNLP.

[37]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[38]  Shlomo Argamon,et al.  Effects of Age and Gender on Blogging , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.