Insights into access patterns of internet media systems: measurements, analysis, and system design

With the dramatic increase of media traffic on the Internet, existing media systems have shown their inefficiencies in resource utilization and performance bottlenecks on high quality media services. Although the inconsistency between the media access patterns and the Zipf-like distributions of Web workloads has been observed by a number of measurement studies, existing media system designs and evaluations still assume that media workload has the same access pattern as that of conventional Web workload. An insightful understanding of media access patterns is essential to guide Internet system design and management, including resource provisioning and performance optimizations. In this Ph.D. dissertation, we analyze the access patterns of Internet media systems and study effective system designs for large scale media content delivery. With extensive measurements on the Internet, we find current media systems tend to over-supply or over-utilize server hardware and network bandwidth to provide high quality media service, which is not a scalable and effective approach for serving the explosively increasing media traffic on the Internet. We then systematically study the access patterns of different kinds of Internet media systems, in order to exploit the temporal locality among media requests for efficient and high performance system design. Our study shows that the reference ranks of media objects on the Internet follow stretched exponential distribution, despite different underlying systems and delivery techniques used. With this kind of access patterns, the performance of media caching in a client-server model is far less effective than that of Web content caching. We further analyze the evolution of object reference rank distributions in long duration media workloads, and find that the temporal locality in media systems increases with time. Thus, long term caching is beneficial to improve the performance of media systems. However, a high volume of storage size is required for long term caching, for which peer-to-peer (P2P) model is attractive. Our stretched exponential model lays out an analytical foundation to establish peer-to-peer caching systems for delivering the huge amount of media content on the Internet. We further conduct a performance study of BitTorrent-like P2P systems for large scale media delivery. Through modeling and analysis, we find although the existing BitTorrent system is effective for addressing the “flash crowd” problem upon the debut of a new file, it has service unavailability and performance instability problems after a period of time, due to the exponentially decreasing peer arrival rate. We then quantitatively analyze the interaction among multiple BitTorrent systems with a graph-based model, and show that intertorrent collaboration is much more effective than stimulating seed peers to serve longer for addressing the service and performance problems in BitTorrent systems. Finally, we propose PROP, a P2P-assisted media caching system, which utilizes peer-to-peer collaboration to provide service scalability and dedicated servers to provide service reliability.