A survey of Web metrics

The unabated growth and increasing significance of the World Wide Web has resulted in a flurry of research activity to improve its capacity for serving information more effectively. But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of Web resources and services. This observation points towards measurements and models that quantify various attributes of web sites. The science of measuring all aspects of information, especially its storage and retrieval or informetrics has interested information scientists for decades before the existence of the Web. Is Web informetrics any different, or is it just an application of classical informetrics to a new medium? In this article, we examine this issue by classifying and discussing a wide ranging set of Web metrics. We present the origins, measurement functions, formulations and comparisons of well-known Web metrics for quantifying Web graph properties, Web page significance, Web page similarity, search and retrieval, usage characterization and information theoretic properties. We also discuss how these metrics can be applied for improving Web information access and use.

[1]  Ramana Rao,et al.  Silk from a sow's ear: extracting usable structures from the Web , 1996, CHI.

[2]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[3]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[4]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[5]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[6]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[7]  Dik Lun Lee,et al.  A World Wide Web Resource Discovery System , 1995, World Wide Web J..

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Leo Egghe,et al.  Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science , 1990 .

[10]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[11]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[12]  Mimi Recker,et al.  Predicting document access in large multimedia repositories , 1996, TCHI.

[13]  Umeshwar Dayal,et al.  From User Access Patterns to Dynamic Hypertext Linking , 1996, Comput. Networks.

[14]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[15]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[16]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[17]  David Dubin Measurement in information science , 1997 .

[18]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[19]  Alberto O. Mendelzon,et al.  What is this page known for? Computing Web page reputations , 2000, Comput. Networks.

[20]  Tim Bray,et al.  Measuring the Web , 1996, World Wide Web J..

[21]  Giles,et al.  Searching the world wide Web , 1998, Science.

[22]  John N. Tsitsiklis,et al.  Introduction to Probability , 2002 .

[23]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[24]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[25]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Massimo Marchiori,et al.  The Quest for Correct Information on the Web: Hyper Search Engines , 1997, Comput. Networks.

[28]  Cyveillance Sizing the Internet , 2000 .

[29]  Aya Soffer,et al.  PicASHOW: pictorial authority search by hyperlinks on the Web , 2001, WWW '01.

[30]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[31]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[32]  Albert,et al.  Dynamics of complex systems: scaling laws for the period of boolean networks , 2000, Physical review letters.

[33]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[34]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[35]  Charles M. Grinstead,et al.  Introduction to probability , 1999, Statistics for the Behavioural Sciences.

[36]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[37]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 1999, Artif. Intell..

[38]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[39]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[40]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[41]  Krishna Bharat,et al.  WEBVIZ: A Tool for World Wide Web Access Log Analysis , 1994 .

[42]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[43]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[44]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[45]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[46]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[47]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[48]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[49]  Dik Lun Lee,et al.  Search and ranking algorithms for locating resources on the World Wide Web , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[50]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[51]  Sheldon M. Ross,et al.  VII – Bandit Processes , 1983 .

[52]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[53]  C. Lee Giles,et al.  Searching the Web: general and scientific information access , 1999, First IEEE/POPOV Workshop on Internet Technologies and Services. Proceedings (Cat. No.99EX391).

[54]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[55]  Oren Etzioni,et al.  Adaptive Web Sites: Automatically Synthesizing Web Pages , 1998, AAAI/IAAI.

[56]  Devanshu Dhyani. Measuring the web : metrics, models and methods , 2001 .

[57]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[58]  Marc Najork,et al.  Measuring Index Quality Using Random Walks on the Web , 1999, Comput. Networks.