Characterizing Computer Systems' Workloads

The performance of any system cannot be determined without knowing the workload, that is, the set of requests presented to the system. Workload characterization is the process by which we produce models that are capable of describing and reproducing the behavior of a workload. Such models are imperative to any performance related studies such as capacity planning, workload balancing, performance prediction and system tuning. In this paper, we survey workload characterization techniques used for several types of computer systems. We identify significant issues and concerns encountered during the characterization process and propose an augmented methodology for workload characterization as a framework. We believe that the surveyed case studies, the described characterization techniques, and the proposed framework give a good introduction to the topic, assist in exploring the different options of characterization tools that can be adopted, and provide general guidelines for deriving a good workload model suitable as an input to performance studies.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  H. Harman Modern factor analysis , 1961 .

[3]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[4]  Peter A. W. Lewis,et al.  Statistical Analysis of Non-Stationary Series of Events in a Data Base System , 1976, IBM J. Res. Dev..

[5]  Stephen S. Lavenberg,et al.  Exploratory Analysis of Access Path Length Data for a Data Base Management System , 1976, IBM J. Res. Dev..

[6]  Ashok K. Agrawala,et al.  An Approach to the Workload Characterization Problem , 1976, Computer.

[7]  H. Pat Artis Capacity planning for MVS computer systems , 1979, PERV.

[8]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[9]  John F. Shoch,et al.  Measured performance of an Ethernet local network , 1980, CACM.

[10]  Michael G. Thomason,et al.  Syntactic Methods in Pattern Recognition , 1982 .

[11]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[12]  Günter Haring,et al.  On Stochastic Models of Interactive Workloads , 1983, Performance.

[13]  Domenico Ferrari,et al.  On the foundations of artificial workload design , 1983, SIGMETRICS '84.

[14]  Giuseppe Serazzi,et al.  Measurement and Tuning of Computer Systems , 1984, Int. CMG Conference.

[15]  Helen Letmanyi,et al.  Guide on workload forecasting , 1985 .

[16]  Giuseppe Serazzi,et al.  A Characterization of the Variation in Time of Workload Arrival Patterns , 1985, IEEE Transactions on Computers.

[17]  Domenico Ferrari,et al.  A Sensitivity Study of the Clustering Approach to Workload Modeling , 1986, Perform. Evaluation.

[18]  Giuseppe Serazzi,et al.  Workload Modeling for Computer Networks , 1988, ARCS.

[19]  Kenneth C. Sevcik Characterizations of parallelism in applications and their use in scheduling , 1989, SIGMETRICS '89.

[20]  Samuel DeFazio,et al.  Diversity in database reference behavior , 1989, SIGMETRICS '89.

[21]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[22]  Thomas J. Leblanc,et al.  Analyzing Parallel Program Executions Using Multiple Views , 1990, J. Parallel Distributed Comput..

[23]  William H. Press,et al.  Numerical recipes , 1990 .

[24]  Mariacarla Calzarossa,et al.  System Performance with User Behavior Graphs , 1990, Perform. Evaluation.

[25]  Riccardo Gusella,et al.  A measurement study of diskless workstation traffic on an Ethernet , 1990, IEEE Trans. Commun..

[26]  Shikharesh Majumdar,et al.  Characterisation of Programs for Scheduling in Multiprogrammed Parallel Systems , 1991, Perform. Evaluation.

[27]  Mary Baker,et al.  Measurements of a distributed file system , 1991, SOSP '91.

[28]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[29]  Richard B. Bunt,et al.  A synthetic workload model for a distributed system file server , 1991, SIGMETRICS '91.

[30]  Philip S. Yu,et al.  Impact of workload partitionability on the performance of coupling architectures for transaction processing , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[31]  Daniel Crow,et al.  DB_Habits: comparing minimal knowledge and knowledge-based approaches to pattern recognition in the domain of user-computer interactions , 1992 .

[32]  Patrick H. Worley,et al.  SPEEDUP PROPERTIES OF PHASES IN THE EXECUTION PROFILE OF DISTRIBUTED PARALLEL PROGRAMS , 1992 .

[33]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[34]  Philip S. Yu,et al.  Database access characterization for buffer hit prediction , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[35]  Giuseppe Serazzi,et al.  Workload characterization: a survey , 1993, Proc. IEEE.

[36]  Paul Kline,et al.  An easy guide to factor analysis , 1993 .

[37]  Giuseppe Serazzi,et al.  Construction and Use of Multiclass Workload Models , 1994, Perform. Evaluation.

[38]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[39]  Philip S. Yu,et al.  Performance Analysis of Affinity Clustering on Transaction Processing Coupling Architecture , 1994, IEEE Trans. Knowl. Data Eng..

[40]  V. Paxson,et al.  Wide-area traffic: the failure of Poisson modeling , 1994, SIGCOMM.

[41]  Bernd Mohr,et al.  Distributed Performance Monitoring: Methods, Tools, and Applications , 1994, IEEE Trans. Parallel Distributed Syst..

[42]  Günter Haring,et al.  Generative networkload models for a single server environment , 1994, SIGMETRICS.

[43]  Virgílio A. F. Almeida,et al.  Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems , 1994 .

[44]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[45]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[46]  Carlo H. Séquin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1995, IEEE Trans. Neural Networks.

[47]  Mark Crovella,et al.  Characteristics of WWW Client-based Traces , 1995 .

[48]  Sally Floyd,et al.  Wide area traffic: the failure of Poisson modeling , 1995, TNET.

[49]  Zarka Cvetanovic,et al.  Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[50]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[51]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[52]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[53]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[54]  Scott E. Hudson,et al.  Supporting dynamic downloadable appearances in an extensible user interface toolkit , 1997, UIST '97.

[55]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[56]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[57]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[58]  Regina Y. Liu Practical Engineering Statistics , 1997 .

[59]  Gabriele Anderst-Kotsis,et al.  A workload characterization methodology for WWW applications , 1997, PMCCN.

[60]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[61]  Gang Liu,et al.  DBMiner: a system for data mining in relational databases and data warehouses , 1997, CASCON.

[62]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[63]  M. Marazakis,et al.  The Impact of Workload Clustering on Transaction Routing , 1998 .

[64]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[65]  Evgenia Smirni,et al.  Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications , 1998, Perform. Evaluation.

[66]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[67]  Mark S. Squillante,et al.  The impact of I/O on program behavior and parallel scheduling , 1998, SIGMETRICS '98/PERFORMANCE '98.

[68]  Jerome A. Rolia,et al.  Web Server Performance Measurement and Modeling Techniques , 1998, Performance evaluation (Print).

[69]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[70]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[71]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[72]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[73]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[74]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[75]  Stephen Northcutt,et al.  Network Intrusion Detection: An Analyst's Hand-book , 1999 .

[76]  Mark Crovella,et al.  Internet performance modeling: the state of the art at the turn of the century , 2000, Perform. Evaluation.

[77]  Isij Monitor,et al.  Network Intrusion Detection: An Analyst’s Handbook , 2000 .

[78]  Mariacarla Calzarossa,et al.  Workload Characterization Issues and Methodologies , 2000, Performance Evaluation.

[79]  David A. Patterson,et al.  Towards a Simplified Database Workload for Computer Architecture Evaluations , 2000 .

[80]  Carsten Sapia,et al.  PROMISE: Predicting Query Behavior to Enable Predictive Caching Strategies for OLAP Systems , 2000, DaWaK.

[81]  Carsten Sapia PROMISE - Modeling and Predicting User Query Behavior in Online Analytical Processing Environments , 2000 .

[82]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[83]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.