Age Inference Using A Hierarchical Attention Neural Network

While demographic attributes, such as age, gender, and location, have been extensively studied, most previous studies usually combine different sources of data, such as the user's biography, pictures, posts, and the user's network to obtain reasonable inference accuracies. However, it is not always practical to collect all those different forms of data. Therefore, in this paper, we consider methods for inferring age that only use Twitter posts (tweet text and emojis). We propose a hierarchical attention neural model that integrates independent linguistic knowledge gained from text and emojis when making a prediction. This hierarchical model is able to capture the intra-post relationship between these different post components, as well as the inter-post relationships of a user's posts. Our empirical evaluation using a data set generated from Wikidata demonstrates that our model achieves better performance than the state-of-the-art models, and still performs well when the number of posts per user is reduced in the training data set.

[1]  Sunghwan Mac Kim,et al.  Demographic Inference on Twitter using Recursive Neural Networks , 2017, ACL.

[2]  Kathleen M. Carley,et al.  A Hierarchical Location Prediction Neural Network for Twitter User Geolocation , 2019, EMNLP.

[3]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[4]  Robert F. Chew,et al.  Predicting age groups of Twitter users based on language and metadata features , 2017, PloS one.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Eduardo Blanco,et al.  Incorporating Emoji Descriptions Improves Tweet Classification , 2019, NAACL.

[7]  Tomoki Taniguchi,et al.  Unifying Text, Metadata, and User Network Representations with a Neural Network for Geolocation Prediction , 2017, ACL.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Fabian Flöck,et al.  Demographic Inference and Representative Population Estimates from Multilingual Social Media Data , 2019, WWW.

[10]  Benno Stein,et al.  Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter , 2017, CLEF.

[11]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[12]  Tomaz Erjavec,et al.  Language-independent Gender Prediction on Twitter , 2017, NLP+CSS@ACL.

[13]  S. Niehuis,et al.  #Happyanniversary: Gender and age differences in spouses’ and partners’ Twitter greetings , 2020 .

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[16]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[17]  Reshmi Gopalakrishna Pillai,et al.  Age Inference on Twitter using SAGE and TF-IGM , 2020, NLPIR.

[18]  Soroush Vosoughi,et al.  Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning , 2017, ACL.

[19]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[20]  Xiaojun Ma,et al.  Twitter User Gender Inference Using Combined Analysis of Text and Image Processing , 2014, VL@COLING.

[21]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[22]  Marc Peter Deisenroth,et al.  Probabilistic Inference of Twitter Users' Age Based on What They Follow , 2016, ECML/PKDD.

[23]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[24]  Fusheng Wang,et al.  A Comparative Study of Demographic Attribute Inference in Twitter , 2015, ICWSM.

[25]  Stefan Wojcik and Adam Hughes,et al.  Sizing Up Twitter Users , 2019 .

[26]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[27]  Joanne Hinds,et al.  What demographic attributes do our digital footprints reveal? A systematic review , 2018, PloS one.

[28]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[29]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[30]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[31]  Yaguang Liu,et al.  A Comparative Analysis of Classic and Deep Learning Models for Inferring Gender and Age of Twitter Users , 2021, DeLTA.

[32]  D. Levinson A conception of adult development. , 1986 .

[33]  Lyle H. Ungar,et al.  User-Level Race and Ethnicity Predictors from Twitter Text , 2018, COLING.