Online Bayesian Models for Personal Analytics in Social Media

Latent author attribute prediction in social media provides a novel set of conditions for the construction of supervised classification models. With individual authors as training and test instances, their associated content ("features") are made available incrementally over time, as they converse over discussion forums. We propose various approaches to handling this dynamic data, from traditional batch training and testing, to incremental bootstrapping, and then active learning via crowdsourcing. Our underlying model relies on an intuitive application of Bayes rule, which should be easy to adopt by the community, thus allowing for a general shift towards online modeling for social media.

[1]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[2]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[3]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[4]  Benjamin Van Durme,et al.  Using Conceptual Class Attributes to Characterize Social Media Users , 2013, ACL.

[5]  Anita Sharma,et al.  Personality and Patterns of Facebook Usage , 2016 .

[6]  Alon Lavie,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, July 12-14, 2012, Jeju Island, Korea , 2012 .

[7]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[8]  Svitlana Volkova,et al.  Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[9]  David Yarowsky,et al.  Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter , 2013, NAACL.

[10]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[11]  Clifton B. Kruse Jr. Esq. How Old Do You Think I Am , 2001 .

[12]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[13]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[14]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[15]  Katja Filippova,et al.  User Demographics and Language in an Implicit Social Network , 2012, EMNLP.

[16]  Jennifer Golbeck,et al.  Predicting Personality from Twitter , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[17]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[18]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[19]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[20]  Eduardo Blanco,et al.  Toward Personality Insights from Language Exploration in Social Media , 2013, AAAI Spring Symposium: Analyzing Microtext.

[21]  Florian Laws,et al.  Effective active learning for complex natural language processing tasks , 2013 .

[22]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[23]  Benjamin Van Durme Jerboa: A Toolkit for Randomized and Streaming Algorithms , 2012 .

[24]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[25]  Sumit Goswami,et al.  A fuzzy based approach to stylometric analysis of blogger's age and gender , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[26]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[27]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[28]  Benjamin Van Durme Streaming Analysis of Discourse Participants , 2012, EMNLP-CoNLL.

[29]  Claire Cardie,et al.  Automatically Generating Annotator Rationales to Improve Sentiment Classification , 2010, ACL.