Workload Generators for Web-Based Systems: Characteristics, Current Status, and Challenges

The growth and evolution of the World Wide Web (WWW) has been rapid over the last ten years and this has been caused mainly by factors such as the social Web and mobile technology. This growth, which presupposes the satisfaction of millions of users accessing Web applications with an adequate quality of service, requires continuous changes in the infrastructure to improve user experience or to handle new demands. Therefore, studies of Web-based systems aimed at comparing different hardware infrastructures, detecting system bottlenecks, provisioning hardware resources, making capacity planning tests, or software testability, are a matter of huge interest. However, the new trends in the WWW have brought new types of user demands and interactions that produce complex workload patterns. These patterns must be exhaustively studied and considered when designing helpful workload generators able to produce representative traces of the current reality. This survey is aimed at providing a useful guide for researchers of the Web, social networking, and other Internet related issues, regarding the main points and concerns about workload generation for Web-based systems. This paper reviews the predominant characteristics and attributes that define Web workloads, including the special cases of other types of Web applications (e.g., blogs, online social network platforms, and video-sharing services). It also identifies the main challenges for the next generation of Web workload generators, and explores current approaches and solutions suggested in recent works.

[1]  Yannis Manolopoulos,et al.  Finding Generalized Path Patterns for Web Log Data Mining , 2000, ADBIS-DASFAA.

[2]  George C. Polyzos,et al.  GlobeTraff: A Traffic Workload Generator for the Performance Evaluation of Future Internet Architectures , 2012, 2012 5th International Conference on New Technologies, Mobility and Security (NTMS).

[3]  Martin F. Arlitt,et al.  Evaluating content management techniques for Web proxy caches , 2000, PERV.

[4]  Jerome A. Rolia,et al.  SWAT: A Tool for Stress Testing Session-based Web Applications , 2003, Int. CMG Conference.

[5]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[6]  Jan Waller,et al.  Performance Benchmarking of Application Monitoring Frameworks , 2014, Softwaretechnik-Trends.

[7]  Lada A. Adamic,et al.  Tracking information epidemics in blogspace , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[8]  C. Costa,et al.  GENIUS: a generator of interactive user media sessions , 2004, IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004.

[9]  George M. Mohay,et al.  A framework for generating realistic traffic for Distributed Denial-of-Service attacks and Flash Events , 2014, Comput. Secur..

[10]  Pablo Rodriguez,et al.  Explore what-if scenarios with SONG: Social Network Write Generator , 2011, ArXiv.

[11]  Ana Pont,et al.  Dweb model: Representing Web 2.0 dynamism , 2009, Comput. Commun..

[12]  Donald F. Towsley,et al.  Self-similarity and long range dependence on the internet: a second look at the evidence, origins and implications , 2005, Comput. Networks.

[13]  Peter J. Denning,et al.  Experiments with program locality , 1899, AFIPS '72 (Fall, part I).

[14]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[15]  Duane Wessels,et al.  High‐performance benchmarking with Web Polygraph , 2004, Softw. Pract. Exp..

[16]  Ben Y. Zhao,et al.  Understanding user behavior in large-scale video-on-demand systems , 2006, EuroSys.

[17]  Anja Feldmann,et al.  Data networks as cascades: investigating the multifractal nature of Internet WAN traffic , 1998, SIGCOMM '98.

[18]  Walter Willinger,et al.  Self-similarity and heavy tails: structural modeling of network traffic , 1998 .

[19]  Weisong Shi,et al.  Workload Characterization of a Personalized Web Site — And Its Implications for Dynamic Content Caching , 2002 .

[20]  Jerome A. Rolia,et al.  A Synthetic Workload Generation Technique for Stress Testing Session-Based Systems , 2006, IEEE Transactions on Software Engineering.

[21]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[22]  Diwakar Krishnamurthy,et al.  A model-based approach for testing the performance of web applications , 2006, SOQUA '06.

[23]  Evgenia Smirni,et al.  Dealing with Burstiness in Multi-Tier Applications: Models and Their Parameterization , 2012, IEEE Transactions on Software Engineering.

[24]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[25]  Paramvir Bahl,et al.  Analyzing the browse patterns of mobile clients , 2001, IMW '01.

[26]  Srinivasan Seshan,et al.  The effects of wide-area conditions on WWW server performance , 2001, SIGMETRICS '01.

[27]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[28]  Neil J. Gunther,et al.  How to Emulate Web Traffic Using Standard Load Testing Tools , 2016, ArXiv.

[29]  Mukesh Kumar,et al.  Web Usage Mining: An Analysis , 2013 .

[30]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[31]  Ravishankar K. Iyer,et al.  Geist: a generator for e-commerce & internet server traffic , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[32]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[33]  James E. Pitkow Summary of WWW characterizations , 2004, World Wide Web.

[34]  Varsha Apte,et al.  AutoPerf: Automated Load Testing and Resource Usage Profiling of Multi-Tier Internet Applications , 2017, ICPE.

[35]  Dror G. Feitelson,et al.  Workload Modeling for Performance Evaluation , 2002, Performance.

[36]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[37]  Walter Willinger,et al.  Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level , 1997, TNET.

[38]  Mariacarla Calzarossa,et al.  Workload Characterization , 2016, ACM Comput. Surv..

[39]  Virgílio A. F. Almeida,et al.  Characterizing reference locality in the WWW , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[40]  Cliff Lampe,et al.  A familiar face(book): profile elements as signals in an online social network , 2007, CHI.

[41]  Peter Parnes,et al.  Characterizing user access to videos on the World Wide Web , 1999, Electronic Imaging.

[42]  Xiaohui Zhang Cachability of Web Objects , 2000 .

[43]  Peter Druschel,et al.  Measuring the capacity of a Web server under realistic loads , 1999, World Wide Web.

[44]  Zhen Liu,et al.  Traffic model and performance evaluation of Web servers , 2001, Perform. Evaluation.

[45]  Athanasios V. Vasilakos,et al.  Understanding user behavior in online social networks: a survey , 2013, IEEE Communications Magazine.

[46]  Miklós Telek,et al.  Acyclic discrete phase type distributions: properties and a parameter estimation algorithm , 2003, Perform. Evaluation.

[47]  Scott A. Brandt,et al.  Modeling, Analysis and Simulation of Flash Crowds on the Internet , 2004 .

[48]  David A. Patterson,et al.  Rain: A Workload Generation Toolkit for Cloud Computing Applications , 2010 .

[49]  Biswanath Mukherjee,et al.  IPTV over EPON: Synthetic traffic generation and performance evaluation , 2015, Opt. Switch. Netw..

[50]  A. Abhari,et al.  Modeling of multimedia files on the Web 2.0 , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[51]  Craig E. Wills,et al.  Characteristics of Mobile Web Content , 2006, 2006 1st IEEE Workshop on Hot Topics in Web Systems and Technologies.

[52]  Weisong Shi,et al.  Modeling object characteristics of dynamic Web content , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[53]  Paul Barford,et al.  Generating representative Web workloads for network and server performance evaluation , 1998, SIGMETRICS '98/PERFORMANCE '98.

[54]  Arshdeep Bahga,et al.  Synthetic Workload Generation for Cloud Computing Applications , 2011, J. Softw. Eng. Appl..

[55]  Ke Xu,et al.  Video requests from Online Social Networks: Characterization, analysis and generation , 2013, 2013 Proceedings IEEE INFOCOM.

[56]  Abrams Marc,et al.  WWW Proxy Traffic Characterization with Application to Caching , 1997 .

[57]  Hårek Haugerud,et al.  Simulation of User-Driven Computer Behaviour , 2001, LISA.

[58]  Luisa Massari,et al.  What's inside MySpace comments? , 2010, Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS '10).

[59]  Wilhelm Hasselbring,et al.  Generating Probabilistic and Intensity-Varying Workload for Web-Based Software Systems , 2008, SIPEW.

[60]  Niklas Carlsson,et al.  A Longitudinal Characterization of Local and Global BitTorrent Workload Dynamics , 2012, PAM.

[61]  Ada Gavrilovska,et al.  Xerxes: Distributed Load Generator for Cloud-scale Experimentation , 2012, 2012 7th Open Cirrus Summit.

[62]  Ludmila Cherkasova,et al.  Characterizing locality, evolution, and life span of accesses in enterprise media server workloads , 2002, NOSSDAV '02.

[63]  Albert Y. Zomaya,et al.  Stochastic Resource Provisioning for Containerized Multi-Tier Web Services in Clouds , 2017, IEEE Transactions on Parallel and Distributed Systems.

[64]  Willy Zwaenepoel,et al.  Performance and scalability of EJB applications , 2002, OOPSLA '02.

[65]  Carey L. Williamson,et al.  On the sensitivity of Web proxy cache performance to workload characteristics , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[66]  A. Horváth,et al.  Approximating heavy tailed behaviour with Phase type distributions , 2000 .

[67]  Christopher Mueller A distributed application level workload generator , 2009 .

[68]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[69]  Evgenia Smirni,et al.  Injecting realistic burstiness to a traditional client-server benchmark , 2009, ICAC '09.

[70]  Tim Brecht,et al.  Methodologies for generating HTTP streaming video workloads to evaluate web server performance , 2012, SYSTOR '12.

[71]  M Krishnamurthy,et al.  Extracting the User's Interests by Using Web Log Data Based on Web Usage Mining , 2015 .

[72]  George M. Mohay,et al.  Modelling Web-server Flash Events , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[73]  Virgílio A. F. Almeida,et al.  Characterizing user navigation and interactions in online social networks , 2012, Inf. Sci..

[74]  Joonwon Lee,et al.  Workload Characterization and Performance Implications of Large-Scale Blog Servers , 2012, TWEB.

[75]  Xabiel G. Pañeda,et al.  Popularity analysis of a video-on-demand service with a great variety of content types: influence of the subject and video characteristics , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[76]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[77]  Lieven Eeckhout,et al.  Measuring benchmark similarity using inherent program characteristics , 2006, IEEE Transactions on Computers.

[78]  Richard B. Bunt,et al.  Hierarchical Workload Characterization for a Busy Web Server , 2002, Computer Performance Evaluation / TOOLS.

[79]  R. Khayari,et al.  ParaSynTG: A parameterized synthetic trace generator for representation of WWW traffic , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[80]  Noël Crespi,et al.  Analysis of publicly disclosed information in Facebook profiles , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[81]  Marco Vieira,et al.  Designing vulnerability testing tools for web services: approach, components, and tools , 2016, International Journal of Information Security.

[82]  Dirk Draheim,et al.  Realistic load testing of Web applications , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[83]  Prashant J. Shenoy,et al.  Dynamic Provisioning of Multi-tier Internet Applications , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[84]  Mary K. Vernon,et al.  Analysis of educational media server workloads , 2001, NOSSDAV '01.

[85]  Alistair Moffat,et al.  Some Observations on User Search Behaviour , 2006, Aust. J. Intell. Inf. Process. Syst..

[86]  Abdolreza Abhari,et al.  Workload generation for YouTube , 2009, Multimedia Tools and Applications.

[87]  David W. Petr,et al.  Characterizing and modeling network traffic variability , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[88]  Olivier Festor,et al.  SONETOR: A social network traffic generator , 2014, 2014 IEEE International Conference on Communications (ICC).

[89]  Michael I. Jordan,et al.  Characterizing, modeling, and generating workload spikes for stateful services , 2010, SoCC '10.

[90]  Virgílio A. F. Almeida,et al.  Traffic Characteristics and Communication Patterns in Blogosphere , 2006, ICWSM.

[91]  A. Fox,et al.  Cloudstone : Multi-Platform , Multi-Language Benchmark and Measurement Tools for Web 2 . 0 , 2008 .

[92]  Alma Riska,et al.  Efficient fitting of long-tailed data sets into hyperexponential distributions , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[93]  Bo Hong,et al.  Managing flash crowds on the Internet , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[94]  Carlos Juiz,et al.  A Statistically Customisable Web Benchmarking Tool , 2009, Electron. Notes Theor. Comput. Sci..

[95]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[96]  Amin Vahdat,et al.  MediSyn: a synthetic streaming media service workload generator , 2003, NOSSDAV '03.

[97]  Raúl Peña Ortiz Accurate workload design for web performance evaluation. , 2013 .

[98]  Bruce M. Maggs,et al.  An analysis of live streaming workloads on the internet , 2004, IMC '04.

[99]  Virgílio A. F. Almeida,et al.  Hierarchical Characterization and Generation of Blogosphere Workloads , 2008 .

[100]  Xin He,et al.  mBenchLab: Measuring QoE of Web applications using mobile devices , 2013, 2013 IEEE/ACM 21st International Symposium on Quality of Service (IWQoS).

[101]  Yu,et al.  Generating Web Traffic Based on User Behavioral Model , 2014 .

[102]  Theodore Johnson,et al.  W ormhole Caching with HTTP PUSH Method for a Satellite-Based Web Content Multicast and Replication System , 1999 .

[103]  Prashant J. Shenoy,et al.  BenchLab: An Open Testbed for Realistic Benchmarking of Web Applications , 2011, WebApps.

[104]  Ana Pont,et al.  Modeling continuous changes of the user's dynamic behavior in the WWW , 2005, WOSP '05.

[105]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[106]  C. Amza,et al.  Specification and implementation of dynamic Web site benchmarks , 2002, 2002 IEEE International Workshop on Workload Characterization.

[107]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[108]  Evgenia Smirni,et al.  Model-Driven System Capacity Planning under Workload Burstiness , 2010, IEEE Transactions on Computers.

[109]  Carey L. Williamson,et al.  ProWGen: a synthetic workload generation tool for simulation evaluation of web proxy caches , 2002, Comput. Networks.

[110]  Diwakar Krishnamurthy,et al.  Web workload generation challenges – an empirical investigation , 2012, Softw. Pract. Exp..

[111]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[112]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[113]  Luisa Massari,et al.  Analysis of MySpace user profiles , 2010, Inf. Syst. Frontiers.

[114]  Adam Wierman,et al.  Open Versus Closed: A Cautionary Tale , 2006, NSDI.

[115]  Cristina D. Murta,et al.  A Transient Overload Generator forWeb Servers , 2008, 2008 IEEE International Performance, Computing and Communications Conference.

[116]  Songqing Chen,et al.  Analyzing patterns of user content generation in online social networks , 2009, KDD.

[117]  Andrey Kolesnikov UniLoG: A Unified Load Generation Tool , 2012, MMB/DFT.

[118]  Ana Pont,et al.  A flexible workload model based on roles of interactive users in social networks , 2016, 2016 IFIP Networking Conference (IFIP Networking) and Workshops.

[119]  Minaxi Gupta,et al.  Revisiting Web Server Workload Invariants in the Context of Scientific Web Sites , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[120]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[121]  Ana Pont,et al.  Analyzing web server performance under dynamic user workloads , 2013, Comput. Commun..

[122]  Ravishankar K. Iyer,et al.  Geist: A Web Traffic Generation Tool , 2002, Computer Performance Evaluation / TOOLS.

[123]  Jozo J. Dujmovic Universal benchmark suites , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[124]  Jiangchuan Liu,et al.  Understanding the Characteristics of Internet Short Video Sharing: YouTube as a Case Study , 2007, ArXiv.

[125]  Anirban Mahanti,et al.  Traffic analysis of a Web proxy caching hierarchy , 2000 .

[126]  Amin Vahdat,et al.  Long-term Streaming Media Server Workload Analysis and Modeling , 2003 .

[127]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.