Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding

Written language carries explicit and implicit biases that can distract from meaningful signals. For example, letters of reference may describe male and female candidates differently, or their writing style may indirectly reveal demographic characteristics. At best, such biases distract from the meaningful content of the text; at worst they can lead to unfair outcomes. We investigate the challenge of re-generating input sentences to ‘neutralize’ sensitive attributes while maintaining the semantic meaning of the original text (e.g. is the candidate qualified?). We propose a gradient-based rewriting framework, Detect and Perturb to Neutralize (DEPEN), that first detects sensitive components and masks them for regeneration, then perturbs the generation model at decoding time under a neutralizing constraint that pushes the (predicted) distribution of sensitive attributes towards a uniform distribution. Our experiments in two different scenarios show that DEPEN can regenerate fluent alternatives that are neutral in the sensitive attribute while maintaining the semantics of other attributes.

[1]  Jiliang Tang,et al.  Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning , 2020, EMNLP.

[2]  Diyi Yang,et al.  Automatically Neutralizing Subjective Bias in Text , 2019, AAAI.

[3]  Jiaxin Pei,et al.  Pre-train and Plug-in: Flexible Conditional Text Generation with Variational Auto-Encoders , 2020, ACL.

[4]  Anna Rumshisky,et al.  Adversarial Decomposition of Text Representation , 2018, NAACL.

[5]  Bill Byrne,et al.  Reducing Gender Bias in Neural Machine Translation as a Domain Adaptation Problem , 2020, ACL.

[6]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[7]  Bodhisattwa Prasad Majumder,et al.  Ask what’s missing and what’s useful: Improving Clarification Question Generation using Global Knowledge , 2021, NAACL.

[8]  Lili Mou,et al.  Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[9]  Nanyun Peng,et al.  Towards Controllable Biases in Language Generation , 2020, FINDINGS.

[10]  Jiliang Tang,et al.  Does Gender Matter? Towards Fairness in Dialogue Systems , 2020, COLING.

[11]  Chenchen Xu,et al.  Privacy-Aware Text Rewriting , 2019, INLG.

[12]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[13]  Brendan T. O'Connor,et al.  Twitter Universal Dependency Parsing for African-American and Mainstream American English , 2018, ACL.

[14]  Furu Wei,et al.  Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting , 2021, EMNLP.

[15]  Xing Shi,et al.  Hafez: an Interactive Poetry Generation System , 2017, ACL.

[16]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[17]  Yejin Choi,et al.  PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction , 2020, EMNLP.

[18]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[19]  Ryan Cotterell,et al.  Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Benno Stein,et al.  Learning to Flip the Bias of News Headlines , 2018, INLG.

[22]  Yulia Tsvetkov,et al.  Style Transfer Through Back-Translation , 2018, ACL.

[23]  Danushka Bollegala,et al.  Debiasing Pre-trained Contextualised Embeddings , 2021, EACL.

[24]  Akhilesh Sudhakar,et al.  “Transforming” Delete, Retrieve, Generate Approach for Controlled Text Style Transfer , 2019, EMNLP.

[25]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[26]  T. Thompson Autism research and services for young children: history, progress and challenges. , 2013, Journal of applied research in intellectual disabilities : JARID.

[27]  Mengting Wan,et al.  Item recommendation on monotonic behavior chains , 2018, RecSys.

[28]  Guillaume Lample,et al.  Multiple-Attribute Text Rewriting , 2018, ICLR.

[29]  Harsh Jhamtani,et al.  Unsupervised Enrichment of Persona-grounded Dialog with Background Stories , 2021, ACL.

[30]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[31]  Xuanjing Huang,et al.  Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation , 2019, ACL.

[32]  Davis Liang,et al.  Masked Language Model Scoring , 2019, ACL.

[33]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[34]  Giorgio Satta,et al.  Theory of Parsing , 2010 .

[35]  Po-Sen Huang,et al.  Reducing Sentiment Bias in Language Models via Counterfactual Evaluation , 2019, FINDINGS.

[36]  Kevin Knight,et al.  Obfuscating Gender in Social Media Writing , 2016, NLP+CSS@EMNLP.