Topical interests and the mitigation of search engine bias

Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the web despite its size and complexity. On the down side, search engines bias the traffic of users according to their page ranking strategies, and it has been argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. This bias could lead to a dangerous monopoly of information. We show that, contrary to intuition, empirical data do not support this conclusion; popular sites receive far less traffic than predicted. We discuss a model that accurately predicts traffic data patterns by taking into consideration the topical interests of users and their searching behavior in addition to the way search engines rank pages. The heterogeneity of user interests explains the observed mitigation of search engines’ popularity bias.

[1]  Shlomo Moran,et al.  Predictive caching and prefetching of query results in search engines , 2003, WWW '03.

[2]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Santo Fortunato,et al.  Scale-free network growth by ranking. , 2006, Physical review letters.

[4]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[5]  Debora Donato,et al.  Large scale properties of the Webgraph , 2004 .

[6]  Giles,et al.  Searching the world wide Web , 1998, Science.

[7]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Ikuo Nakamura,et al.  Characterization of topological structure on complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Kostas Tsioutsiouliklis,et al.  \Googlearchy": How a Few Heavily-Linked Sites Dominate Politics on the Web , 2003 .

[11]  S. Redner,et al.  Connectivity of growing random networks. , 2000, Physical review letters.

[12]  Junghoo Cho,et al.  Impact of search engines on page popularity , 2004, WWW '04.

[13]  Sandeep Pandey,et al.  Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results , 2005, VLDB.

[14]  Jon Kleinberg,et al.  The Structure of the Web , 2001, Science.

[15]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[17]  Ricardo A. Baeza-Yates,et al.  Web Structure, Dynamics and Page Quality , 2002, SPIRE.

[18]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[19]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[20]  Filippo Menczer,et al.  Growing and navigating the small world Web by local content , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Helen Nissenbaum,et al.  Defining the Web: The Politics of Search Engines , 2000, Computer.

[22]  Filippo Menczer,et al.  Evolution of document networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Alessandro Vespignani,et al.  Traffic-driven model of the World Wide Web graph , 2004, WAW.

[24]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[25]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[26]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, COCOON.

[27]  Abbe Mowshowitz,et al.  Bias on the web , 2002, CACM.

[28]  Junghoo Cho,et al.  Page quality: in search of an unbiased web ranking , 2005, SIGMOD '05.