Wikipedia Workload Analysis

We study an access trace containing a sample of Wikipedia’s traffic over a 107-day period. We perform a global analysis of the whole trace, and a detailed analysis of the requests directed to the English edition of Wikipedia. In our study, we classify client requests and examine aspects such as the number of read and save operations, flash crowds, and requests for nonexisting pages. We also outline strategies for improving Wikipedia performance in a decentralized hosting environment.

[1]  Andrew S. Tanenbaum,et al.  Dynamically Selecting Optimal Distribution Strategies for Web Documents , 2002, IEEE Trans. Computers.

[2]  Songqing Chen,et al.  Analysis of multimedia workloads with implications for internet streaming , 2005, WWW '05.

[3]  Carsten Griwodz,et al.  Analysis of Server Workload and Client Interactions in a News-on-Demand Streaming System , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[4]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[5]  Bo Hong,et al.  Managing flash crowds on the Internet , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[6]  Ludmila Cherkasova,et al.  Analysis of enterprise media server workloads: access patterns, locality, content evolution, and rates of change , 2004, IEEE/ACM Transactions on Networking.

[7]  Krishna P. Gummadi,et al.  Measurement, modeling, and analysis of a peer-to-peer file-sharing workload , 2003, SOSP '03.

[8]  Dan Rubenstein,et al.  A lightweight, robust P2P system to handle flash crowds , 2002, IEEE Journal on Selected Areas in Communications.

[9]  Binxing Fang,et al.  Defending Against Flash Crowds and Malicious Traffic Attacks with An Auction-Based Method , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[10]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[11]  G. Pierre,et al.  Predictability of Web-server traffic congestion , 2005, 10th International Workshop on Web Content Caching and Distribution (WCW'05).

[12]  Bruce M. Maggs,et al.  An analysis of live streaming workloads on the internet , 2004, IMC '04.

[13]  Jerome A. Rolia,et al.  Characterizing the scalability of a large web-based shopping system , 2001, ACM Trans. Internet Techn..

[14]  Geoffrey M. Voelker,et al.  Characterization of a Large Web Site Population with Implications for Content Delivery , 2004, WWW '04.

[15]  Guillaume Pierre,et al.  A Decentralized Wiki Engine for Collaborative Wikipedia Hosting , 2007, WEBIST.

[16]  John S. Heidemann,et al.  Flash crowd mitigation via adaptive admission control based on application-level observations , 2005, TOIT.