Workload Characterization and Performance Implications of Large-Scale Blog Servers

With the ever-increasing popularity of Social Network Services (SNSs), an understanding of the characteristics of these services and their effects on the behavior of their host servers is critical. However, there has been a lack of research on the workload characterization of servers running SNS applications such as blog services. To fill this void, we empirically characterized real-world Web server logs collected from one of the largest South Korean blog hosting sites for 12 consecutive days. The logs consist of more than 96 million HTTP requests and 4.7TB of network traffic. Our analysis reveals the following: (i) The transfer size of nonmultimedia files and blog articles can be modeled using a truncated Pareto distribution and a log-normal distribution, respectively; (ii) user access for blog articles does not show temporal locality, but is strongly biased towards those posted with image or audio files. We additionally discuss the potential performance improvement through clustering of small files on a blog page into contiguous disk blocks, which benefits from the observed file access patterns. Trace-driven simulations show that, on average, the suggested approach achieves 60.6% better system throughput and reduces the processing time for file access by 30.8% compared to the best performance of the Ext4 filesystem.

[1]  Cameron Marlow,et al.  Feed me: motivating newcomer contribution in social network sites , 2009, CHI.

[2]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[3]  Songqing Chen,et al.  Analyzing patterns of user content generation in online social networks , 2009, KDD.

[4]  MahantiAnirban,et al.  Characterizing and modelling popularity of user-generated videos , 2011 .

[5]  Minaxi Gupta,et al.  Revisiting Web Server Workload Invariants in the Context of Scientific Web Sites , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[6]  M. Crovella,et al.  Estimating the Heavy Tail Index from Scaling Properties , 1999 .

[7]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[8]  Moriyoshi Ohara,et al.  The data-centricity of Web 2.0 workloads and its impact on server performance , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[10]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[11]  M. Meerschaert,et al.  Parameter Estimation for the Truncated Pareto Distribution , 2006 .

[12]  Paul Barford,et al.  A performance evaluation of hyper text transfer protocols , 1999, SIGMETRICS '99.

[13]  Daniel A. Menascé Workload Characterization , 2003, IEEE Internet Comput..

[14]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[15]  Srinivasan Seshan,et al.  2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference , 2007 .

[16]  Richard B. Bunt,et al.  Hierarchical Workload Characterization for a Busy Web Server , 2002, Computer Performance Evaluation / TOOLS.

[17]  HwangJeaho,et al.  Workload Characterization and Performance Implications of Large-Scale Blog Servers , 2012 .

[18]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[19]  Tao Yang,et al.  Cooperative caching of dynamic content on a distributed Web server , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[20]  Geoffrey M. Voelker,et al.  Characterization of a Large Web Site Population with Implications for Content Delivery , 2004, WWW '04.

[21]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[22]  Priya Nagpurkar,et al.  Workload characterization of selected JEE-based Web 2.0 applications , 2008, 2008 IEEE International Symposium on Workload Characterization.

[23]  Pablo Rodriguez Web Infrastructure for the 21st Century , 2009 .

[24]  Virgílio A. F. Almeida,et al.  Traffic Characteristics and Communication Patterns in Blogosphere , 2006, ICWSM.

[25]  Dan Ionescu,et al.  Measurement-based traffic characterization for Web 2.0 applications , 2009, 2009 IEEE Instrumentation and Measurement Technology Conference.

[26]  Zongpeng Li,et al.  Youtube traffic characterization: a view from the edge , 2007, IMC '07.

[27]  Christos Faloutsos,et al.  Cascading Behavior in Large Blog Graphs , 2007 .

[28]  Carey L. Williamson,et al.  Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights , 2011, TWEB.

[29]  Sally Floyd,et al.  Wide-Area Traffic: The Failure of Poisson Modeling , 1994, SIGCOMM.

[30]  Gregory R. Ganger,et al.  Argon: Performance Insulation for Shared Storage Servers , 2007, FAST.

[31]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[32]  Krishna Kant,et al.  Performance Impact of Uncached File Accesses in SPECweb99 , 2000 .

[33]  Andrew Tomkins,et al.  Informed multi-process prefetching and caching , 1997, SIGMETRICS '97.

[34]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[35]  A. Iyengar,et al.  An analysis of Web server performance , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[36]  Christopher Stewart,et al.  Empirical examination of a collaborative web application , 2008, 2008 IEEE International Symposium on Workload Characterization.

[37]  Arun Iyengar,et al.  Improving Web Server Performance by Caching Dynamic Data , 1997, USENIX Symposium on Internet Technologies and Systems.

[38]  Balachander Krishnamurthy,et al.  A measure of Online Social Networks , 2009, 2009 First International Communication Systems and Networks and Workshops.

[39]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .

[40]  Martin Arlitt,et al.  Web Workload Characterization: Ten Years Later , 2005 .

[41]  Dong Li,et al.  A light-weight, temporary file system for large-scale Web servers , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[42]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.