Fair Generative Modeling via Weak Supervision

Real-world datasets are often biased with respect to key demographic factors such as race and gender. Due to the latent nature of the underlying factors, detecting and mitigating bias is especially challenging for unsupervised machine learning. We present a weakly supervised algorithm for overcoming dataset bias for deep generative models. Our approach requires access to an additional small, unlabeled reference dataset as the supervision signal, thus sidestepping the need for explicit labels on the underlying bias factors. Using this supplementary dataset, we detect the bias in existing datasets via a density ratio technique and learn generative models which efficiently achieve the twin goals of: 1) data efficiency by using training examples from both biased and reference datasets for learning; and 2) data generation close in distribution to the reference dataset at test time. Empirically, we demonstrate the efficacy of our approach which reduces bias w.r.t. latent factors by an average of up to 34.6% over baselines for comparable image generation using generative adversarial networks.

[1]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[2]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[3]  Christine Kaeser-Chen,et al.  Positionality-aware machine learning: translation tutorial , 2020, FAT*.

[4]  Timnit Gebru,et al.  Lessons from archives: strategies for collecting sociocultural data in machine learning , 2019, FAT*.

[5]  J. Kleinberg,et al.  Roles for computing in social change , 2019, FAT*.

[6]  Stefano Ermon,et al.  AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows , 2019, AAAI.

[7]  Colin Raffel,et al.  Towards GAN Benchmarks Which Require Generalization , 2020, ICLR.

[8]  Nanyun Peng,et al.  The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[9]  Zoubin Ghahramani,et al.  One-Network Adversarial Fairness , 2019, AAAI.

[10]  Kush R. Varshney,et al.  Fairness GAN , 2018, IBM Journal of Research and Development.

[11]  Eric Horvitz,et al.  Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting , 2019, DGS@ICLR.

[12]  Krzysztof Onak,et al.  Scalable Fair Clustering , 2019, ICML.

[13]  Daniela Rus,et al.  Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure , 2019, AIES.

[14]  Deeparnab Chakrabarty,et al.  Fair Algorithms for Clustering , 2019, NeurIPS.

[15]  Christian Sohler,et al.  Fair Coresets and Streaming Algorithms for Fair k-Means Clustering , 2018, ArXiv.

[16]  Stefano Ermon,et al.  Learning Controllable Fair Representations , 2018, AISTATS.

[17]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[18]  Jason Yosinski,et al.  Metropolis-Hastings Generative Adversarial Networks , 2018, ICML.

[19]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[20]  Trevor Darrell,et al.  Discriminator Rejection Sampling , 2018, ICLR.

[21]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[22]  Sanjay Shakkottai,et al.  Importance weighted generative networks , 2018, ECML/PKDD.

[23]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[24]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Eric Jang,et al.  Generative Ensembles for Robust Anomaly Detection , 2018, ArXiv.

[27]  Kush R. Varshney,et al.  Data Pre-Processing for Discrimination Prevention: Information-Theoretic Optimization and Analysis , 2018, IEEE Journal of Selected Topics in Signal Processing.

[28]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.

[29]  Krishna P. Gummadi,et al.  Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making , 2018, NeurIPS.

[30]  Lu Zhang,et al.  FairGAN: Fairness-aware Generative Adversarial Networks , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[31]  Alexei A. Efros,et al.  Learning Beyond Human Expertise with Generative Models for Dental Restorations , 2018, ArXiv.

[32]  Esther Rolf,et al.  Delayed Impact of Fair Machine Learning , 2018, ICML.

[33]  Krishna P. Gummadi,et al.  Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction , 2018, WWW.

[34]  Yi Zhang,et al.  Do GANs learn the distribution? Some Theory and Empirics , 2018, ICLR.

[35]  He Ma,et al.  Quantitatively Evaluating GANs With Divergences Proposed for Training , 2018, ICLR.

[36]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[37]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[38]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[39]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[40]  Stefano Ermon,et al.  Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models , 2017, AAAI.

[41]  Stefano Ermon,et al.  Boosted Generative Models , 2016, AAAI.

[42]  Douglas Eck,et al.  Counterpoint by Convolution , 2019, ISMIR.

[43]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[44]  Hee Jung Ryu,et al.  InclusiveFaceNet: Improving Face Attribute Detection with Race and Gender Diversity , 2017 .

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Philip S. Thomas,et al.  Importance Sampling for Fair Policy Selection , 2017, UAI.

[47]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[48]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[49]  Shakir Mohamed,et al.  Variational Approaches for Auto-Encoding Generative Adversarial Networks , 2017, ArXiv.

[50]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[51]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[52]  Peter Dayan,et al.  Comparison of Maximum Likelihood and GAN-based training of Real NVPs , 2017, ArXiv.

[53]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[54]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Dustin Tran,et al.  Hierarchical Implicit Models and Likelihood-Free Variational Inference , 2017, NIPS.

[56]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[57]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.

[58]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[59]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[60]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[61]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[64]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[65]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[66]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[67]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[68]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[69]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[70]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[71]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[72]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[73]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[74]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[75]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[76]  F. Kamiran,et al.  Data preprocessing techniques for classification without discrimination , 2012, Knowledge and Information Systems.

[77]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[78]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[79]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[80]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[81]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .