A General Gender Inference Method Based on Web

Gender information, as a crucial part of human demographics, is valuable for its abundant connotations and potential applications. Though much effort has been made on the problem of gender inference, most existing methods are highly dependent on data from specific sources, like Twitter, and are difficult to be generalized to other tasks. In this work, we propose a general Web-based method for gender inference. We show that our model significantly outperforms state-of-the-art without much human workload or any limits on specific scenarios. Based on that, we also present a voting framework to efficiently incorporate several methods to further improve performance. Experiments show that our voting framework can achieve 96.9% accuracy. Keywords-component; data mining; gender prediction; demographic; big data

[1]  Philip S. Yu,et al.  Say It with Colors: Language-Independent Gender Classification on Twitter , 2014, Online Social Media Analysis and Visualization.

[2]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  Theodoros Tzouramanis,et al.  A robust gender inference model for online social networks and its application to LinkedIn and Twitter , 2014, First Monday.

[4]  Walter Daelemans,et al.  Predicting age and gender in online social networks , 2011, SMUC '11.

[5]  Xiaojun Ma,et al.  Gender estimation for SNS user profiling using automatic image annotation , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[6]  Keith W. Ross,et al.  What's in a Name: A Study of Names, Gender Inference, and Gender Behavior in Facebook , 2011, DASFAA Workshops.

[7]  Alexander Panchenko,et al.  Detecting Gender by Full Name: Experiments with the Russian Language , 2014, AIST.

[8]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[9]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[10]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[11]  N. Ellemers,et al.  Gender contributes to personal research funding success in The Netherlands , 2015, Proceedings of the National Academy of Sciences.

[12]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[13]  Philip S. Yu,et al.  Language independent gender classification on Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[14]  Arjun Mukherjee,et al.  Improving Gender Classification of Blog Authors , 2010, EMNLP.

[15]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[16]  Clayton Fink,et al.  Inferring Gender from the Content of Tweets: A Region Specific Example , 2012, ICWSM.

[17]  Andrei Cimpian,et al.  Expectations of brilliance underlie gender distributions across academic disciplines , 2015, Science.