Measuring the Importance of User-Generated Content to Search Engines

Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to respond to queries. Analyzing results for six types of important queries (e.g. most popular, trending, expensive advertising), we observe that Wikipedia appears in over 80% of results pages for some query types and is by far the most prevalent individual content source across all query types. More generally, our results provide empirical information to inform a nascent but rapidly-growing debate surrounding a highly-consequential question: Do users provide enough value to intelligent technologies that they should receive more of the economic benefits from intelligent technologies?

[1]  Loren G. Terveen,et al.  Geographic Biases are 'Born, not Made': Exploring Contributors' Spatiotemporal Behavior in OpenStreetMap , 2018, GROUP.

[2]  Jaime Teevan,et al.  Understanding the importance of location, time, and people in mobile local search behavior , 2011, Mobile HCI.

[3]  Qiang Yang,et al.  Beyond ten blue links: enabling user click modeling in federated web search , 2012, WSDM '12.

[4]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[5]  Johannes Schöning,et al.  The Geography of Pokémon GO: Beneficial and Problematic Effects on Places and Movement , 2017, CHI.

[6]  Brent J. Hecht,et al.  The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies , 2017, ICWSM.

[7]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[8]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[9]  Aaron D. Shaw,et al.  The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation , 2013, PloS one.

[10]  Krishna P. Gummadi,et al.  Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media , 2017, CSCW.

[11]  Johannes Schöning,et al.  The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum , 2017, CHI.

[12]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[13]  U. Gretzel,et al.  Role of social media in online travel information search , 2010 .

[14]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[15]  Darren Gergle,et al.  The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context , 2010, CHI.

[16]  Ophir Frieder,et al.  Enhancing web search in the medical domain via query clarification , 2016, Information Retrieval Journal.

[17]  Aniket Kittur,et al.  Effects of peer feedback on contribution: a field experiment in Wikipedia , 2013, CHI.

[18]  Chris Van Pelt,et al.  Designing a scalable crowdsourcing platform , 2012, SIGMOD Conference.

[19]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[20]  J. Lanier,et al.  Should We Treat Data as Labor? Moving Beyond 'Free' , 2017 .

[21]  David Lazer,et al.  Location, Location, Location: The Impact of Geolocation on Web Search Personalization , 2015, Internet Measurement Conference.

[22]  Jaron Lanier,et al.  Who Owns the Future , 2013 .

[23]  Ronald E. Robertson,et al.  The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections , 2015, Proceedings of the National Academy of Sciences.

[24]  Brent J. Hecht,et al.  Examining Wikipedia With a Broader Lens: Quantifying the Value of Wikipedia's Relationships with Other Large-Scale Online Communities , 2018, CHI.

[25]  D. D. Ingram,et al.  NCHS urban-rural classification scheme for counties. , 2012, Vital and health statistics. Series 2, Data evaluation and methods research.

[26]  Logan Kugler The war over the value of personal data , 2018, Commun. ACM.

[27]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[28]  Michaël Laurent,et al.  Research Paper: Seeking Health Information Online: Does Wikipedia Matter? , 2009, J. Am. Medical Informatics Assoc..

[29]  Michael Luca,et al.  User-Generated Content and Social Media , 2021, E-Commerce and Convergence: A Guide to the Law of Digital Media.

[30]  Brent J. Hecht,et al.  A Tale of Cities: Urban Biases in Volunteered Geographic Information , 2014, ICWSM.

[31]  V. Dhar,et al.  Does Chatter Matter? The Impact of User-Generated Content on Music Sales , 2007 .

[32]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[33]  Joseph M. Reagle,et al.  Gender Bias in Wikipedia and Britannica , 2011 .

[34]  Michaël,et al.  Seeking health information online: does Wikipedia matter? , 2009, Journal of the American Medical Informatics Association : JAMIA.

[35]  David Lazer,et al.  Auditing the Personalization and Composition of Politically-Related Search Engine Results Pages , 2018, WWW.

[36]  David Lazer,et al.  Measuring Price Discrimination and Steering on E-commerce Web Sites , 2014, Internet Measurement Conference.

[37]  David García,et al.  It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia , 2015, ICWSM.

[38]  Darren Gergle,et al.  Measuring self-focus bias in community-maintained knowledge repositories , 2009, C&T.

[39]  Aaron Halfaker,et al.  Not at Home on the Range: Peer Production and the Urban/Rural Divide , 2016, CHI.

[40]  David Berthelot,et al.  WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia , 2016, ACL.

[41]  Deborah D Ingram,et al.  2013 NCHS Urban-Rural Classification Scheme for Counties. , 2014, Vital and health statistics. Series 2, Data evaluation and methods research.

[42]  Panayiotis Zaphiris,et al.  Cultural Differences in Collaborative Authoring of Wikipedia , 2006, J. Comput. Mediat. Commun..

[43]  Joe Phua,et al.  Telling stories about breastfeeding through Facebook: The impact of user-generated content (UGC) on pro-breastfeeding attitudes , 2015, Comput. Hum. Behav..

[44]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[45]  Aniket Kittur,et al.  Effectiveness of shared leadership in online communities , 2012, CSCW.

[46]  Chai Haiyan,et al.  An Impact of Social Media on Online Travel Information Search in China , 2010, 2010 3rd International Conference on Information Management, Innovation Management and Industrial Engineering.

[47]  Alexander van Deursen,et al.  Using the Internet: Skill related problems in users' online behavior , 2009, Interact. Comput..

[48]  Derek Ruths,et al.  Organizations Are Users Too: Characterizing and Detecting the Presence of Organizations on Twitter , 2015, ICWSM.

[49]  Thorsten Joachims,et al.  In Google We Trust: Users' Decisions on Rank, Position, and Relevance , 2007, J. Comput. Mediat. Commun..

[50]  Graham Vickery,et al.  Participative Web And User-Created Content: Web 2.0 Wikis and Social Networking , 2007 .

[51]  Philippe van Basshuysen,et al.  Radical Markets: Uprooting Capitalism and Democracy for a Just Society , 2019, Review of Political Economy.

[52]  Nick Feamster,et al.  Exposing Inconsistent Web Search Results with Bobble , 2014, PAM.