Finding Shared Fragments in Large Collections of Web Pages for Fragment-Based Web Caching

To reduce network-related delays in serving dynamic Web pages, various approaches have been proposed, however, one of the common fundamental problems encountered in some representatives of them is how to automatically find shared fragments in large numbers of Web pages. This paper gives a formal definition of the problem, presents an efficient and scalable algorithm for it. The algorithm has been implemented and applied to 16 large sets of Web pages. The experiments show that the algorithm can provide an average of 59.79%~72.28% bandwidth savings in fragment-based Web caching