Cross-media Age Regression with Textual Adaptation

In realistic scenarios, an age regression model learned from one social media (named the source media) generally performs rather poorly when it is tested on another social media (named the target media). In this paper, a textual adaptation approach is proposed to cross-media age regression which aims to improve the age regression performance by exploiting textual features in the labeled data from the source media and unlabeled data from the target media. The basic idea to achieve this lies in the fact that many textual features are shared by the data from both social media. Specifically, two views generated by random subspace generation (RSG) are leveraged to train two separate regressors in a co-training algorithm for adding automatically-labeled samples in the target media. Moreover, we tackle the confidence evaluation challenge in co-training by the query by committee (QBC) approach. Empirical studies demonstrate the effectiveness of the proposed approach to cross-media age regression.

[1]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[2]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[3]  Robert H. Warren,et al.  Age and Geographic Inferences of the LiveJournal Social Network , 2006, SNA@ICML.

[4]  Guodong Zhou,et al.  User age prediction by combining classification and regression}{User age prediction by combining classification and regression , 2017 .

[5]  Sudeshna Sarkar,et al.  Stylometric Analysis of Bloggers' Age and Gender , 2009, ICWSM.

[6]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.

[7]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[8]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[9]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[10]  Sara Rosenthal,et al.  Age Prediction in Blogs: A Study of Style, Content, and Online Behavior in Pre- and Post-Social Media Generations , 2011, ACL.

[11]  Deying Li,et al.  Joint User Attributes and Item Category in Factor Models for Rating Prediction , 2016, DASFAA.

[12]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nikolaos Aletras,et al.  An analysis of the user occupational class through Twitter content , 2015, ACL.

[14]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[15]  Daisuke Ikeda,et al.  Semi-Supervised Learning for Blog Classification , 2008, AAAI.

[16]  Marie-Francine Moens,et al.  Age and Gender Identification in Social Media , 2014, CLEF.

[17]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[18]  F. Windmeijer,et al.  R-Squared Measures for Count Data Regression Models With Applications to Health-Care Utilization , 1996 .

[19]  John D. Burger,et al.  An Exploration of Observable Features Related to Blogger Age , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.