A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines

A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of millions of people. In this paper, we report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs). Our results extend prior work by considering three U.S. search engines, simulating both mobile and desktop devices, and using a spatial analysis approach designed to study modern SERPs that are no longer just "ten blue links". We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries. Furthermore, we observe that Wikipedia links often appear in "Knowledge Panel" SERP elements and are in positions visible to users without scrolling, although Wikipedia appears less in prominent positions on mobile devices. Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.

[1]  Brent J. Hecht,et al.  Measuring the Importance of User-Generated Content to Search Engines , 2019, ICWSM.

[2]  M. de Rijke,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[3]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[4]  Jeff Huang,et al.  SearchGazer: Webcam Eye Tracking for Remote Studies of Web Search , 2017, CHIIR.

[5]  Amanda Spink,et al.  Determining the user intent of web search engine queries , 2007, WWW '07.

[6]  J. Lanier,et al.  Should We Treat Data as Labor? Moving Beyond 'Free' , 2017 .

[7]  Brent J. Hecht,et al.  The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies , 2017, ICWSM.

[8]  Artem Grotov,et al.  Online Learning to Rank for Information Retrieval: SIGIR 2016 Tutorial , 2016, SIGIR.

[9]  Ophir Frieder,et al.  Enhancing web search in the medical domain via query clarification , 2016, Information Retrieval Journal.

[10]  Fernando Diaz,et al.  Search Result Prefetching on Desktop and Mobile , 2017, TOIS.

[11]  Áron,et al.  Should We Treat Data as Labor? Moving Beyond “Free” , 2017 .

[12]  David Lazer,et al.  Location, Location, Location: The Impact of Geolocation on Web Search Personalization , 2015, Internet Measurement Conference.

[13]  Qiang Yang,et al.  Beyond ten blue links: enabling user click modeling in federated web search , 2012, WSDM '12.

[14]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[15]  Annabel Rothschild,et al.  How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources , 2018 .

[16]  Joseph M. Reagle,et al.  Gender Bias in Wikipedia and Britannica , 2011 .

[17]  David Lazer,et al.  Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages , 2018, WWW.

[18]  Eni Mustafaraj,et al.  Investigating the Effects of Google's Search Engine Result Page in Evaluating the Credibility of Online News Sources , 2018, WebSci.