Gender-Aware Reinflection using Linguistically Enhanced Neural Models

In this paper, we present an approach for sentence-level gender reinflection using linguistically enhanced sequence-to-sequence models. Our system takes an Arabic sentence and a given target gender as input and generates a gender-reinflected sentence based on the target gender. We formulate the problem as a user-aware grammatical error correction task and build an encoderdecoder architecture to jointly model reinflection for both masculine and feminine grammatical genders. We also show that adding linguistic features to our model leads to better reinflection results. The results on a blind test set using our best system show improvements over previous work, with a 3.6% absolute increase in M2 F0.5. Bias Statement Most NLP systems are unaware of their users’ preferred grammatical gender. Such systems typically generate a single output for a specific input without considering any user information. Beyond being simply incorrect in many cases, such output patterns create representational harm by propagating social biases and inequalities of the world we live in. While such biases can be traced back to the NLP systems’ training data, balancing and cleaning the training data will not guarantee the correctness of a single output that is arrived at without accounting for user preferences. Our view is that NLP systems should utilize grammatical gender preference information to provide the correct user-aware output, particularly for gender-marking morphologically rich languages. When the grammatical gender preference information is unavailable to the systems, all gender-specific outputs should be generated and properly marked. We acknowledge that by limiting the choice of gender expression to the grammatical gender choices in Arabic, we exclude other alternatives such as non-binary gender or no-gender expressions. We are not aware of any sociolinguistics published research that discusses such alternatives for Arabic, although there are growing grassroots efforts, e.g., the Ebdal Project.1

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[3]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[4]  Nizar Habash,et al.  A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality , 2011, ACL.

[5]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[6]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[7]  Kemal Oflazer,et al.  Large Scale Arabic Error Annotation: Guidelines and Framework , 2014, LREC.

[8]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Katharina Kann,et al.  Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection , 2016, ACL.

[13]  Yulia Tsvetkov,et al.  Morphological Inflection Generation Using Character Sequence to Sequence Learning , 2015, NAACL.

[14]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[15]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[16]  Lucia Specia,et al.  Personalized Machine Translation: Preserving Original Author Traits , 2016, EACL.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[19]  Michela Menegatti,et al.  Gender Bias and Sexism in Language , 2017 .

[20]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[21]  Yoav Goldberg,et al.  Morphological Inflection Generation with Hard Monotonic Attention , 2016, ACL.

[22]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[23]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[24]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[25]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[26]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[27]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[28]  Nizar Habash,et al.  Utilizing Character and Word Embeddings for Text Normalization with Sequence-to-Sequence Models , 2018, EMNLP.

[29]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[30]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[31]  Marcin Junczys-Dowmunt,et al.  Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task , 2018, NAACL.

[32]  Nizar Habash,et al.  An Arabic Morphological Analyzer and Generator with Copious Features , 2018 .

[33]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[34]  Nizar Habash,et al.  Automatic Gender Identification and Reinflection in Arabic , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[35]  Abram Handler,et al.  Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts , 2019, EMNLP.

[36]  Marta R. Costa-jussà,et al.  Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[37]  Ryan Cotterell,et al.  It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[38]  Marcin Junczys-Dowmunt,et al.  Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data , 2019, BEA@ACL.

[39]  Ryan Cotterell,et al.  Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[40]  Ruihong Huang,et al.  In Plain Sight: Media Bias Through the Lens of Factual Reporting , 2019, EMNLP.

[41]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[42]  Alan W Black,et al.  Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[43]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[44]  Yoav Goldberg,et al.  Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[45]  Alan W Black,et al.  Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[46]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[47]  Hila Gonen,et al.  Automatically Identifying Gender Issues in Machine Translation using Perturbations , 2020, FINDINGS.

[48]  J. Weston,et al.  Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.

[49]  Nanyun Peng,et al.  Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition , 2019, HT.

[50]  Marcis Pinnis,et al.  Mitigating Gender Bias in Machine Translation with Target Gender Annotations , 2020, WMT.

[51]  Jason Weston,et al.  Multi-Dimensional Gender Bias Classification , 2020, EMNLP.

[52]  Jieyu Zhao,et al.  Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer , 2020, ACL.

[53]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.