Request distribution-aware caching in cluster-based Web servers

This work presents a performance analysis of request distribution-aware caching in cluster-based Web servers. We use the Zipf-like request distribution curve to guide static Web document caching. A combination of cooperative caching and exclusive caching provides for a cluster-wide caching system that avoids document replication accross the cluster. We explore the benefits of cooperative caching algorithms that use request distribution information to steer their behavior over general purpose cooperative caching algorithms. Exclusive caching exercises a fine-grained control over replication of data blocks across the cluster. The performance of the system has been assessed by using the WebStone benchmark. Our cluster-based server employs Linux kernel-level implementations of cooperative caching and exclusive caching. Current results show that request distribution-aware caching outperforms general-purpose caching algorithms, makes up for the performance loss of non-replicated data solutions and compares favorably to fully-replicated solutions.

[1]  Jeanna Neefe Matthews,et al.  Serverless network file systems , 1996, TOCS.

[2]  Brian N. Bershad,et al.  Extensibility safety and performance in the SPIN operating system , 1995, SOSP.

[3]  Florin Isaila,et al.  Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system , 2004, ICS '04.

[4]  John Wilkes,et al.  My Cache or Yours? Making Storage More Exclusive , 2002, USENIX Annual Technical Conference, General Track.

[5]  Walter F. Tichy,et al.  CARDS: cluster-aware remote disks , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[6]  John H. Hartman,et al.  Efficient cooperative caching using hints , 1996, OSDI '96.

[7]  Toni Cortes,et al.  Paca: a Distributed File System Cache for Parallel Machines. Performance under Unix-like Workload , 1995 .

[8]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[9]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[10]  Mor Harchol-Balter,et al.  Size-based scheduling to improve web performance , 2003, TOCS.

[11]  Mor Harchol-Balter,et al.  Connection Scheduling in Web Servers , 1999, USENIX Symposium on Internet Technologies and Systems.

[12]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[13]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[14]  Ricardo Bianchini,et al.  Improving Disk Throughput in Data-Intensive Servers , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[15]  Walter F. Tichy,et al.  On the Design and Performance of Remote Disk Drivers for Clusters of PCs , 2004, PDPTA.

[16]  Syam Gadde,et al.  The Trickle-Down Effect: Web Caching and Server Request Distribution , 2002, Comput. Commun..

[17]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.