Stereotypical gender actions can be extracted from web text

We extracted gender-specific actions from text corpora and Twitter, and compared them with stereotypical expectations of people. We used Open Mind Common Sense (OMCS), a common sense knowledge repository, to focus on actions that are pertinent to common sense and daily life of humans. We use the gender information of Twitter users and web-corpus-based pronoun/name gender heuristics to compute the gender bias of the actions. With high recall, we obtained a Spearman correlation of 0.47 between corpus-based predictions and a human gold standard, and an area under the ROC curve of 0.76 when predicting the polarity of the gold standard. We conclude that it is feasible to use natural text (and a Twitter-derived corpus in particular) in order to augment common sense repositories with the stereotypical gender expectations of actions. We also present a dataset of 441 common sense actions with human judges' ratings on whether the action is typically/slightly masculine/feminine (or neutral), and another larger dataset of 21,442 actions automatically rated by the methods we investigate in this study. © 2011 Wiley Periodicals, Inc.

[1]  R. Lakoff Language and woman's place , 1973, Language in Society.

[2]  J. Holmes Hedges and boosters in women's and men's speech , 1990 .

[3]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[4]  Catherine Sherron,et al.  Constructing Common Sense , 2000, Woman, Work and Computerization.

[5]  Andrew S. Gordon,et al.  Browsing image collections with representations of common-sense activities , 2001, J. Assoc. Inf. Sci. Technol..

[6]  Lenhart K. Schubert,et al.  Extracting and evaluating general world knowledge from the Brown Corpus , 2003, HLT-NAACL 2003.

[7]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[8]  Shane Bergsma,et al.  Automatic Acquisition of Gender Information for Anaphora Resolution , 2005, Canadian Conference on AI.

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Hugo Liu,et al.  Of Men, Women, and Computers: Data-driven Gender Modeling for Improved User Interfaces , 2022 .

[11]  Shlomo Argamon,et al.  Mining the Blogosphere: Age, gender and the varieties of self-expression , 2007, First Monday.

[12]  Robert Speer,et al.  Open Mind Commons: An Inquisitive Approach to Learning Common Sense , 2007 .

[13]  Henry Lieberman Usable AI Requires Commonsense Knowledge , 2008 .

[14]  Henry Lieberman,et al.  Digital Intuition: Applying Common Sense Using Dimensionality Reduction , 2009, IEEE Intelligent Systems.

[15]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[16]  Randy Goebel,et al.  Glen, Glenda or Glendale: Unsupervised and Semi-supervised Learning of English Noun Gender , 2009, CoNLL.

[17]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[18]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[19]  Susannah Fox,et al.  Twitter and status updating , 2009 .

[20]  Miles Osborne,et al.  The Edinburgh Twitter Corpus , 2010, HLT-NAACL 2010.

[21]  Mike Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010, J. Assoc. Inf. Sci. Technol..

[22]  M. Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010 .

[23]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[24]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[25]  Mike Thelwall,et al.  Sentiment in Twitter events , 2011, J. Assoc. Inf. Sci. Technol..