Content + Context Networks for User Classification in Twitter ∗

Twitter is a massive platform for open communication between diverse groups of people. While traditional media segregates the world’s population on lines of language, age, physical location, social status, and many other characteristics, Twitter cuts through these divides. The result is an extremely diverse social network. In this work, we combine features of this network structure with content analytics on the tweets in order to create a content + context network, capturing the relations not only between people, but also between people and content and between content and content. This rich structure allows deep analysis into many aspects of communication over Twitter. We focus on predicting user classifications by using relational probability trees with features from content + context networks. Experiments demonstrate that these features are salient and complementary for user classification.

[1]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[2]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[3]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[4]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[5]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[6]  Timothy J. Hazen Topic Identification , 2014, Encyclopedia of Social Network Analysis and Mining.

[7]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[8]  Hsin-Hsi Chen,et al.  Detection of Bloggers' Interests: Using Textual, Temporal, and Interactive Features , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[9]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[10]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Daniel M. Romero,et al.  Influence and passivity in social media , 2010, ECML/PKDD.

[12]  Carsten Wiuf,et al.  Sampling properties of random graphs: the degree distribution. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Carey E. Priebe,et al.  Vertex Nomination via Content and Context , 2012, ArXiv.

[14]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.