You are what you tweet…pic! gender prediction based on semantic analysis of social media images

We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only from the user profile picture or colors, we propose to look at the distribution of semantics in the pictures coming from the whole feed of a person to estimate gender. In order to compute such semantic distribution, we trained models from existing visual taxonomies to recognize objects, scenes and activities, and applied them to the images in each user's feed. Experiments conducted on a set of ten thousand twitter users and their collection of half a million images revealed that the gender signal can indeed be extracted from the users image feed (75.6% accuracy). Furthermore, the combination of visual cues resulted almost as strong as textual analysis in predicting gender, while providing complementary information that can be employed to further boost gender prediction accuracy to 88% when combined with textual data. As a byproduct of our investigation, we were also able to extrapolate the semantic categories of posted pictures mostly correlated to males and females.

[1]  Puneet Singh Ludu Inferring gender of a Twitter user using celebrities it follows , 2014, ArXiv.

[2]  Xiaojun Ma,et al.  Twitter User Gender Inference Using Combined Analysis of Text and Image Processing , 2014, VL@COLING.

[3]  Theodoros Tzouramanis,et al.  A robust gender inference model for online social networks and its application to LinkedIn and Twitter , 2014, First Monday.

[4]  Philip S. Yu,et al.  Language independent gender classification on Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[5]  Takahide Hoshide,et al.  What is he/she like?: Estimating Twitter user attributes from contents and social neighbors , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[6]  Dong Nguyen,et al.  Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment , 2014, COLING.

[7]  Yujing Jiang,et al.  Learning Compact Face Representation: Packing a Face into an int32 , 2014, ACM Multimedia.

[8]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[9]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[10]  Eric Gilbert,et al.  Specialization, homophily, and gender in a social curation site: findings from pinterest , 2014, CSCW.

[11]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[12]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[13]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[14]  Tat-Seng Chua,et al.  Harvesting Multiple Sources for User Profile Learning: a Big Data Study , 2015, ICMR.

[15]  Benjamin Van Durme,et al.  Using Conceptual Class Attributes to Characterize Social Media Users , 2013, ACL.

[16]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[17]  Xiaojun Ma,et al.  Gender estimation for SNS user profiling using automatic image annotation , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).