Recommendation or Discrimination?: Quantifying Distribution Parity in Information Retrieval Systems

Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features of the human models rather than fashion garments when generating recommendations. In this work, we introduce a statistical test for "distribution parity" in the top-K IR results, which assesses whether a given set of recommendations is fair with respect to a specific protected variable. We evaluate our test using both simulated and empirical results. First, using artificially biased recommendations, we demonstrate the trade-off between statistically detectable bias and the size of the search catalog. Second, we apply our test to a visual search system for fashion garments, specifically testing for recommendation bias based on the skin tone of fashion models. Our distribution parity test can help ensure that IR systems' results are fair and produce a good experience for all users.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Alexandra Chouldechova,et al.  Does mitigating ML's impact disparity require treatment disparity? , 2017, NeurIPS.

[3]  M. Jeanmougin SOLEIL ET PEAU , 1992 .

[4]  Jeff Donahue,et al.  Visual Search at Pinterest , 2015, KDD.

[5]  Brian A. Nosek,et al.  Pervasiveness and correlates of implicit attitudes and stereotypes , 2007 .

[6]  A. Saw,et al.  The What, the Why, and the How: A Review of Racial Microaggressions Research in Psychology , 2014, Race and social problems.

[7]  A. Chardon,et al.  Skin colour typology and suntanning pathways , 1991, International journal of cosmetic science.

[8]  Robinson Piramuthu,et al.  ModaNet: A Large-scale Street Fashion Dataset with Polygon Annotations , 2018, ACM Multimedia.

[9]  Liang Lin,et al.  Clothing Co-parsing by Joint Image Segmentation and Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[11]  Pratik Gajane,et al.  On formalizing fairness in prediction with machine learning , 2017, ArXiv.

[12]  F. Bernerd,et al.  Variations in skin colour and the biological consequences of ultraviolet radiation exposure , 2013, The British journal of dermatology.

[13]  Charles Henderson,et al.  Gender Discrimination in Physics and Astronomy: Graduate Student Experiences of Sexism and Gender Microaggressions. , 2016 .

[14]  Vesla M. Weaver The Electoral Consequences of Skin Color: The “Hidden” Side of Race in Politics , 2012 .

[15]  H. A. Ananya,et al.  Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce , 2017, ArXiv.

[16]  M. Adelman,et al.  Looking for Love in All the White Places: A Study of Skin Color Preferences on Indian Matrimonial and Mate-Seeking Websites , 2009 .

[17]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Michael Omi,et al.  Racial formation in the United States , 1986 .

[20]  Catherine Lee "Race" and "ethnicity" in biomedical research: how do scientists construct and explain differences in health? , 2009, Social science & medicine.

[21]  Shuicheng Yan,et al.  Fashion Parsing With Weak Color-Category Labels , 2014, IEEE Transactions on Multimedia.

[22]  D. Saint‐Léger The colour of the human skin: fruitful science, unsuitable wordings , 2015, International journal of cosmetic science.

[23]  T. Fitzpatrick The validity and practicality of sun-reactive skin types I through VI. , 1988, Archives of Dermatology.

[24]  Putra Manggala,et al.  Using Image Fairness Representations in Diversity-Based Re-ranking for Recommendations , 2018, UMAP.

[25]  Tara S. Behrend,et al.  Do You See What I See? Perceptions of Gender Microaggressions in the Workplace , 2014 .

[26]  Amanda Williams,et al.  Racial microaggressions and perceptions of Internet memes , 2016, Comput. Hum. Behav..

[27]  Krishna P. Gummadi,et al.  Learning Fair Classifiers , 2015, 1507.05259.

[28]  Qiaosong Wang,et al.  Visual Search at eBay , 2017, KDD.

[29]  A. Todorov,et al.  The development of race-based perceptual categorization: skin color dominates early category judgments. , 2015, Developmental science.

[30]  Zhipeng Wu,et al.  Looking at Outfit to Parse Clothing , 2017, ArXiv.

[31]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[32]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Ricardo Baeza-Yates,et al.  FA*IR: A Fair Top-k Ranking Algorithm , 2017, CIKM.

[34]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[35]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Julia Stoyanovich,et al.  Measuring Fairness in Ranked Outputs , 2016, SSDBM.

[37]  K. Nadal,et al.  A qualitative approach to intersectional microaggressions: Understanding influences of race, ethnicity, gender, sexuality, and religion. , 2015 .

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  M. Strube,et al.  Making of a Face: Role of Facial Physiognomy, Skin Tone, and Color Presentation Mode in Evaluations of Racial Typicality , 2009, The Journal of social psychology.