Webometrics: Some Critical Issues of WWW Size Estimation Methods

The number of webpages in the Internet has increased tremendously over the last two decades however only a part of it is indexed by various search engines. This small portion is the indexable web of the Internet and can be usually reachable from a Search Engine. Search engines play a big role in making the World Wide Web accessible to the end user, and how much of the World Wide Web is accessible on the size of the search engine’s index. Researchers have proposed several ways to estimate this size of the indexable web using search engines with and without privileged access to the search engine’s database. Our report provides a summary of methods used in the last two decades to estimate the size of the World Wide Web, as well as describe how this knowledge can be used in other aspects/tasks concerning the World Wide Web.

[1]  André Trudel,et al.  How Big is the World Wide Web? , 2002, ICWI.

[2]  Amanda Spink,et al.  A study of results overlap and uniqueness among major Web search engines , 2006, Inf. Process. Manag..

[3]  Amanda Spink,et al.  Overlap Among Major Web Search Engines , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[4]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[5]  Enrique Orduña-Malea,et al.  Methods for estimating the size of Google Scholar , 2014, Scientometrics.

[6]  Antonio Gulli,et al.  The indexable web is more than 11.5 billion pages , 2005, WWW '05.

[7]  KyungMann Kim,et al.  Contrasting treatment‐specific survival using double‐robust estimators , 2012 .

[8]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[9]  Dirk Lewandowski,et al.  A three-year study on the freshness of web search engine databases , 2008, J. Inf. Sci..

[10]  Richard A. Davis,et al.  Lost in Cyberspace: The Web @ Work , 2002, Cyberpsychology Behav. Soc. Netw..

[11]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[12]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[13]  Bernd-Peter Paris,et al.  Measuring the size of the Internet via importance sampling , 2003, IEEE J. Sel. Areas Commun..

[14]  Judit Bar-Ilan Search engine results over time-a case study on search engine stability , 1998 .

[15]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[16]  Madian Khabsa,et al.  Digital commons , 2020, Internet Policy Rev..

[17]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[18]  Antal van den Bosch,et al.  Estimating search engine index size variability: a 9-year longitudinal study , 2016, Scientometrics.

[19]  Antonio Gulli,et al.  Building an open source meta-search engine , 2005, WWW '05.

[20]  Peter Ingwersen,et al.  Perspective of webometrics , 2004, Scientometrics.

[21]  Harsh Taneja Mapping an audience-centric World Wide Web: A departure from hyperlink analysis , 2017, New Media Soc..

[22]  Peter Ingwersen,et al.  Toward a basic framework for webometrics , 2004, J. Assoc. Inf. Sci. Technol..

[23]  Dirk Lewandowski,et al.  The freshness of web search engine databases , 2006, J. Inf. Sci..