Chunk and object level deduplication for web optimization: A hybrid approach

Proxy caches or Redundancy Elimination (RE) systems have been used to remove redundant bytes in WAN links. However, they come with some inherited deficiencies. Proxy caches provide less savings than RE systems, and RE systems have limitations related to speed, memory and storage overhead. In this paper we advocate the use of a hybrid approach, in which each type of cache acts as a module in a system with shared memory and storage space. A static scheduler precedes the cache modules and determines what types of traffic should be forwarded to which module. We also propose several optimizations for each of the modules, such that the storage and memory overhead are minimized. We evaluate the proposed system by performing a trace driven emulation. Our results indicate that a hybrid system is able to provide better savings than a proxy cache, or a standalone RE system. The hybrid system requires less memory, less disk space and provides a speed-up ratio equal to three compared to an RE system.