SkyServer Traffic Report - The First Five Years

The SkyServer is an Internet portal to the Sloan Digital Sky Survey Catalog Archive Server. From 2001 to 2006, there were a million visitors in 3 million sessions generating 170 million Web hits, 16 million ad-hoc SQL queries, and 62 million page views. The site currently averages 35 thousand visitors and 400 thousand sessions per month. The Web and SQL logs are public. We analyzed traffic and sessions by duration, usage pattern, data product, and client type (mortal or bot) over time. The analysis shows (1) the site's popularity, (2) the educational website that delivered nearly fifty thousand hours of interactive instruction, (3) the relative use of interactive, programmatic, and batch-local access, (4) the success of offering ad-hoc SQL, personal database, and batch job access to scientists as part of the data publication, (5) the continuing interest in "old" datasets, (6) the usage of SQL constructs, and (7) a novel approach of using the corpus of correct SQL queries to suggest similar but correct statements when a user presents an incorrect SQL statement.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Alexander S. Szalay,et al.  The Sloan Digital Sky Survey , 1999, Comput. Sci. Eng..

[3]  Jonathan C. McDowell Galaxy Evolution Explorer , 2003 .

[4]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[5]  Randal C. Burns,et al.  Bypass caching: making scientific databases good network citizens , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Nolan Li,et al.  Batch is back: CasJobs, serving multi-TB data on the Web , 2005, IEEE International Conference on Web Services (ICWS'05).

[7]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Rahul Singh,et al.  Multimodal Usage Visualization for Large Websites , 2005 .

[10]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[11]  Peter Z. Kunszt,et al.  Migrating a multiterabyte archive from object to relational databases , 2003, Comput. Sci. Eng..

[12]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[13]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[14]  J. Avery,et al.  The long tail. , 1995, Journal of the Tennessee Medical Association.

[15]  E. Montroll,et al.  Maximum entropy formalism, fractals, scaling phenomena, and 1/f noise: A tale of tails , 1983 .