Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990

Contemporary debates on filter bubbles and polarization in public and social media raise the question to what extent news media of the past exhibited biases. This paper specifically examines bias related to gender in six Dutch national newspapers between 1950 and 1990. We measure bias related to gender by comparing local changes in word embedding models trained on newspapers with divergent ideological backgrounds. We demonstrate clear differences in gender bias and changes within and between newspapers over time. In relation to themes such as sexuality and leisure, we see the bias moving toward women, whereas, generally, the bias shifts in the direction of men, despite growing female employment number and feminist movements. Even though Dutch society became less stratified ideologically (depillarization), we found an increasing divergence in gender bias between religious and social-democratic on the one hand and liberal newspapers on the other. Methodologically, this paper illustrates how word embeddings can be used to examine historical language change. Future work will investigate how fine-tuning deep contextualized embedding models, such as ELMO, might be used for similar tasks with greater contextual information.

[1]  M. Wintle An Economic and Social History of the Netherlands, 1800-1920: Demographic, Economic and Social Transition , 2000 .

[2]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[3]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[4]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[5]  Justin M. Rao,et al.  Filter Bubbles, Echo Chambers, and Online News Consumption , 2016 .

[6]  Aksp Aas The power of NEWS , 2010 .

[7]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[8]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[9]  Adam Tauman Kalai,et al.  Quantifying and Reducing Stereotypes in Word Embeddings , 2016, ArXiv.

[10]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[11]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[12]  Peter Boot,et al.  The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary , 2017 .

[13]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[14]  Hanna Zijlstra,et al.  Validiteit van de Nederlandse versie van de Linguistic Inquiry and Word Count (liwc) , 2005 .

[15]  Katja Hofmann,et al.  The Cornetto Database: Architecture and User-Scenarios , 2007 .

[16]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[17]  Maarten Marx,et al.  UvA-DARE (Digital Academic Repository) Words are Malleable: Computing Semantic Shifts in Political and Media Discourse , 2017 .