Auditing Search Engines for Differential Satisfaction Across Demographics

Many online services, such as search engines, social media platforms, and digital marketplaces, are advertised as being available to any user, regardless of their age, gender, or other demographic factors. However, there are growing concerns that these services may systematically underserve some groups of users. In this paper, we present a framework for internally auditing such services for differences in user satisfaction across demographic groups, using search engines as a case study. We first explain the pitfalls of naively comparing the behavioral metrics that are commonly used to evaluate search engines. We then propose three methods for measuring latent differences in user satisfaction from observed differences in evaluation metrics. To develop these methods, we drew on ideas from the causal inference literature and the multilevel modeling literature. Our framework is broadly applicable to other online services, and provides general insight into interpreting their evaluation metrics.

[1]  Peifeng Yin,et al.  Silence is also evidence: interpreting dwell time for recommendation from psychological perspective , 2013, KDD.

[2]  GORDON E. LEGGE,et al.  PSYCHOPHYSICS OF READING: XIX , 2003 .

[3]  Karrie Karahalios,et al.  Auditing Algorithms : Research Methods for Detecting Discrimination on Internet Platforms , 2014 .

[4]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[5]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[6]  Ingmar Weber,et al.  Who uses web search for what: and how , 2011, WSDM '11.

[7]  D. Rubin Matched Sampling for Causal Effects , 2006 .

[8]  Ronald E. Robertson,et al.  The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections , 2015, Proceedings of the National Academy of Sciences.

[9]  Eugene Agichtein,et al.  Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries , 2010, NAACL.

[10]  Nicholas Diakopoulos,et al.  Algorithmic Accountability , 2015 .

[11]  GayGeri,et al.  The influence of task and gender on search and evaluation behavior using Google , 2006 .

[12]  Nick Craswell,et al.  Beyond clicks: query reformulation as a predictor of search satisfaction , 2013, CIKM.

[13]  Ryen W. White,et al.  Characterizing and predicting search engine switching behavior , 2009, CIKM.

[14]  Rodrigo de Sales,et al.  O profissional da informação e o seu compromisso ético com a procedência da informação: uma análise do fenômeno das fake news à luz do IFLA Code of Ethics for Librarians and other information workers , 2019 .

[15]  David Miller,et al.  Web search strategies and human individual differences: Cognitive and demographic factors, Internet attitudes, and approaches , 2005, J. Assoc. Inf. Sci. Technol..

[16]  S. Connelly,et al.  Age and reading: the impact of distraction. , 1991, Psychology and aging.

[17]  Nigel Ford,et al.  Web search strategies and human individual differences: Cognitive and demographic factors, Internet attitudes, and approaches: Research Articles , 2005 .

[18]  Ben Carterette,et al.  Incorporating variability in user behavior into systems based evaluation , 2012, CIKM.

[19]  Andreas Dengel,et al.  Segment-level display time as implicit feedback: a comparison to eye tracking , 2009, SIGIR.

[20]  Jamshid Beheshti,et al.  Gender differences in collaborative Web searching behavior: an elementary school study , 2002, Inf. Process. Manag..

[21]  Adam Tauman Kalai,et al.  Quantifying and Reducing Stereotypes in Word Embeddings , 2016, ArXiv.

[22]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[23]  Roxana Geambasu,et al.  Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence , 2015, CCS.

[24]  Suresh Venkatasubramanian,et al.  Auditing Black-box Models by Obscuring Features , 2016, ArXiv.

[25]  Milad Shokouhi,et al.  Inferring the demographics of search users: social data meets search queries , 2013, WWW.

[26]  Filip Radlinski,et al.  Inferring query intent from reformulations and clicks , 2010, WWW '10.

[27]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[28]  Mingming Zhou,et al.  Gender difference in web search perceptions and behavior: Does it vary by task performance? , 2014, Comput. Educ..

[29]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[30]  Ryen W. White,et al.  Understanding and Predicting Graded Search Satisfaction , 2015, WSDM.

[31]  Thorsten Joachims,et al.  The influence of task and gender on search and evaluation behavior using Google , 2006, Inf. Process. Manag..

[32]  G E Legge,et al.  Psychophysics of reading--X. Effects of age-related changes in vision. , 1991, Journal of gerontology.

[33]  Juan Manuel Cueva Lovelle,et al.  Implicit feedback techniques on recommender systems applied to electronic books , 2012, Comput. Hum. Behav..

[34]  F. K. Cylke INTERNATIONAL FEDERATION OF LIBRARY ASSOCIATIONS AND INSTITUTIONS , 1979 .

[35]  Maarten de Rijke,et al.  A Context-aware Time Model for Web Search , 2016, SIGIR.

[36]  Ryen W. White Interactions with Search Systems , 2016 .

[37]  Doug Downey,et al.  Heads and tails: studies of web search with common and rare queries , 2007, SIGIR.

[38]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[39]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[40]  References , 1971 .

[41]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[42]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[43]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence , 2017 .

[44]  Ryen W. White,et al.  Personalized models of search satisfaction , 2013, CIKM.

[45]  Susan T. Dumais,et al.  Classification-enhanced ranking , 2010, WWW '10.

[46]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[47]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[48]  James Allan,et al.  Predicting searcher frustration , 2010, SIGIR.

[49]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[50]  M. Chi,et al.  Gender Differences in Patterns of Searching the Web , 2003 .