Exploiting nonstationarity for performance prediction

Real production applications ranging from enterprise applications to large e-commerce sites share a crucial but seldom-noted characteristic: The relative frequencies of transaction types in their workloads are nonstationary, i.e., the transaction mix changes over time. Accurately predicting application-level performance in business-critical production applications is an increasingly important problem. However, transaction mix nonstationarity casts doubt on the practical usefulness of prediction methods that ignore this phenomenon. This paper demonstrates that transaction mix nonstationarity enables a new approach to predicting application-level performance as a function of transaction mix. We exploit nonstationarity to circumvent the need for invasive instrumentation and controlled benchmarking during model calibration; our approach relies solely on lightweight passive measurements that are routinely collected in today's production environments. We evaluate predictive accuracy on two real business-critical production applications. The accuracy of our response time predictions ranges from 10% to 16% on these applications, and our models generalize well to workloads very different from those used for calibration. We apply our technique to the challenging problem of predicting the impact of application consolidation on transaction response times. We calibrate models of two testbed applications running on dedicated machines, then use the models to predict their performance when they run together on a shared machine and serve very different workloads. Our predictions are accurate to within 4% to 14%. Existing approaches to consolidation decision support predict post-consolidation resource utilizations. Our method allows application-level performance to guide consolidation decisions.

[1]  I. Barrodale,et al.  An Improved Algorithm for Discrete $l_1 $ Linear Approximation , 1973 .

[2]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[3]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[4]  K. Mani Chandy,et al.  Open, Closed, and Mixed Networks of Queues with Different Classes of Customers , 1975, JACM.

[5]  Peter J. Denning,et al.  The Operational Analysis of Queueing Network Models , 1978, CSUR.

[6]  H. Pat Artis Capacity planning for MVS computer systems , 1979, PERV.

[7]  R. Hogg An Introduction to Robust Estimation , 1979 .

[8]  Stephen S. Lavenberg,et al.  Mean-Value Analysis of Closed Multichain Queuing Networks , 1980, JACM.

[9]  Antonino Mazzeo,et al.  Workload characterization for trend analysis , 1981, PERV.

[10]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[11]  Jerome A. Rolia,et al.  The Method of Layers , 1995, IEEE Trans. Software Eng..

[12]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[13]  Carey L. Williamson,et al.  Internet Web servers: workload characterization and performance implications , 1997, TNET.

[14]  E. Ziegel,et al.  Applied Linear Statistical Models@@@Applied Linear Regression Models , 1997 .

[15]  J. Larus Whole program paths , 1999, PLDI '99.

[16]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[17]  Virgílio A. F. Almeida,et al.  A methodology for workload characterization of E-commerce sites , 1999, EC '99.

[18]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[19]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[20]  Walker,et al.  Capacity Planning for Internet Services , 2001 .

[21]  Jerome A. Rolia,et al.  Characterizing the scalability of a large web-based shopping system , 2001, ACM Trans. Internet Techn..

[22]  Dale Borowiak,et al.  Linear Models, Least Squares and Alternatives , 2001, Technometrics.

[23]  Jin Cao,et al.  On the nonstationarity of Internet traffic , 2001, SIGMETRICS '01.

[24]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[25]  Artur Andrzejak,et al.  Bounding the Resource Savings of Utility Computing Models , 2002 .

[26]  Prashant J. Shenoy,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[27]  Timothy Roscoe,et al.  Resource overbooking and application profiling in shared hosting platforms , 2002, OSDI '02.

[28]  Jerome A. Rolia,et al.  SWAT: A Tool for Stress Testing Session-based Web Applications , 2003, Int. CMG Conference.

[29]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[30]  Qing Wang,et al.  Characterizing customer groups for an e-commerce website , 2004, EC '04.

[31]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[32]  Julia Kastner,et al.  Introduction to Robust Estimation and Hypothesis Testing , 2005 .

[33]  Gunter Bolch,et al.  Queueing Networks and Markov Chains , 2005 .

[34]  Terence Kelly,et al.  Detecting Performance Anomalies in Global Applications , 2005, WORLDS.

[35]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[36]  PacificiGiovanni,et al.  An analytical model for multi-tier internet services and its applications , 2005 .

[37]  Lui Sha,et al.  Modeling 3-tiered Web applications , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[38]  Christopher Stewart,et al.  Performance modeling and system management for multi-component online services , 2005, NSDI.

[39]  Asser N. Tantawi,et al.  An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[40]  Giuliano Casale An efficient algorithm for the exact analysis of multiclass queueing networks with large population sizes , 2006, SIGMETRICS '06/Performance '06.

[41]  Jerome A. Rolia,et al.  A Synthetic Workload Generation Technique for Stress Testing Session-Based Systems , 2006, IEEE Transactions on Software Engineering.

[42]  Adam Wierman,et al.  Open Versus Closed: A Cautionary Tale , 2006, NSDI.

[43]  T. Kelly,et al.  Predicting Performance in Distributed Enterprise Applications , 2006 .

[44]  Julio César López-Hernández,et al.  Stardust: tracking activity in a distributed storage system , 2006, SIGMETRICS '06/Performance '06.

[45]  John Coggeshall,et al.  The MySQL Database , 2009 .