Beliefs and biases in web search

People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers seek or are presented with information that significantly deviates from the truth. There is little understanding of the impact of such biases in search. In this paper we study search-related biases via multiple probes: an exploratory retrospective survey, human labeling of the captions and results returned by a Web search engine, and a large-scale log analysis of search behavior on that engine. Targeting yes-no questions in the critical domain of health search, we show that Web searchers exhibit their own biases and are also subject to bias from the search engine. We clearly observe searchers favoring positive information over negative and more than expected given base rates based on consensus answers from physicians. We also show that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth. Importantly, we show that these biases can be counterproductive and affect search outcomes; in our study, around half of the answers that searchers settled on were actually incorrect. Our findings have implications for search engine design, including the development of ranking algorithms that con-sider the desire to satisfy searchers (by validating their beliefs) and providing accurate answers and properly considering base rates. Incorporating likelihood information into search is particularly important for consequential tasks, such as those with a medical focus.

[1]  P. Wason On the Failure to Eliminate Hypotheses in a Conceptual Task , 1960 .

[2]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[3]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[4]  Abbe Mowshowitz,et al.  Bias on the web , 2002, CACM.

[5]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Daniel Kahneman,et al.  Availability: A heuristic for judging frequency and probability , 1973 .

[8]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[9]  Enhong Chen,et al.  Context-aware ranking in web search , 2010, SIGIR '10.

[10]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[11]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[12]  Peter Ingwersen,et al.  Polyrepresentation of information needs and semantic entities: elements of a cognitive theory for information retrieval interaction , 1994, SIGIR '94.

[13]  Ryen W. White,et al.  Predicting short-term interests using activity-based search context , 2010, CIKM.

[14]  Ryen W. White,et al.  Probabilistic models for personalizing web search , 2012, WSDM '12.

[15]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[16]  Robert S. Taylor Question-Negotiation and Information Seeking in Libraries , 1968, Coll. Res. Libr..

[17]  Nina Mishra,et al.  Domain bias in web search , 2012, WSDM '12.

[18]  A Vespignani,et al.  Topical interests and the mitigation of search engine bias , 2006, Proceedings of the National Academy of Sciences.

[19]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[20]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[21]  M. Kendall,et al.  The Logic of Scientific Discovery. , 1959 .

[22]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[23]  H. Simon Bounded Rationality and Organizational Learning , 1991 .

[24]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[25]  P. Todd,et al.  Simple Heuristics That Make Us Smart , 1999 .

[26]  Meredith Ringel Morris,et al.  Augmenting Web Pages and Search Results to Help People Find Trustworthy Information Online , 2011 .

[27]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[28]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[29]  David Hawking,et al.  Focused crawling for both topical relevance and quality of medical information , 2005, CIKM '05.

[30]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[31]  Carol Collier Kuhlthau,et al.  Inside the search process: Information seeking from the user's perspective , 1991, J. Am. Soc. Inf. Sci..

[32]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[33]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[34]  Meredith Ringel Morris,et al.  Augmenting web pages and search results to support credibility assessment , 2011, CHI.

[35]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[36]  Filip Radlinski,et al.  Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs , 2006, AAAI 2006.

[37]  A. Hama Predictably Irrational: The Hidden Forces That Shape Our Decisions , 2010 .

[38]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[39]  J. Klayman,et al.  Confirmation, Disconfirmation, and Informa-tion in Hypothesis Testing , 1987 .

[40]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[41]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[42]  Ryen W. White,et al.  Studies of the onset and persistence of medical concerns in search logs , 2012, SIGIR '12.

[43]  Charles B. Inlander Good Operations, Bad Operations: The People's Medical Society's Guide to Surgery , 1993 .

[44]  Tefko Saracevic,et al.  The Stratified Model of Information Retrieval Interaction: Extension and Applications , 1997 .

[45]  J. Baron Thinking and Deciding , 2023 .

[46]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[47]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[48]  Ryen W. White,et al.  Cyberchondria: Studies of the escalation of medical concerns in Web search , 2009, TOIS.

[49]  Gary Marchionini,et al.  Information Seeking in Electronic Environments , 1995 .