Computer Performance Engineering

Survivability is a critical attribute of modern computer and communication systems. The assessment of survivability is mostly performed in a qualitative manner and thus cannot meet the need for more precise and solid evaluation of service loss or degradation in presence of failure/attack/disaster. This talk addresses the current research status of quantification of survivability. First, we carefully define survivability and contrast it with traditional measures such as reliability, availability and performability [2, 8, 7]. We use “survivability” as defined by the ANSI T1A1.2 committee – that is, the transient performance from the instant an undesirable event occurs until steady state with an acceptable performance level is attained [1]. Thus survivability can be seen as a generalization of recovery after a failure or any undesired event [3]. We then discuss probabilistic models for the quantification of survivability based on our chosen definition. Next, three case studies are presented to illustrate our approach. One case study is about the quantitative evaluation of several survivable architectures for the plain old telephone system (POTS) [5]. The second case study deals with the survivability quantification of communication networks [4] while the third is that of smart grid distribution automation networks [6]. In each case hierarchical models are developed to derive various survivability measures. Numerical results are provided to show how a comprehensive understanding of the system behavior after failure can be achieved through such models.

[1]  John Murphy,et al.  Detecting Performance Antipatterns in Component Based Enterprise Systems , 2008, J. Object Technol..

[2]  Helen D. Karatza,et al.  An M/M/2 parallel system model with pure space sharing among rigid jobs , 2007, Math. Comput. Model..

[3]  Alexander S. Rumyantsev An HPC Upgrade/Downgrade that Provides Workload Stability , 2015, PaCT.

[4]  Will Cappelli Magic Quadrant for Application Performance Monitoring , 2010 .

[5]  Qi-Ming He,et al.  Fundamentals of Matrix-Analytic Methods , 2013, Springer New York.

[6]  Alexander S. Rumyantsev,et al.  Stability criterion of a multiserver model with simultaneous service , 2017, Ann. Oper. Res..

[7]  Wilhelm Hasselbring,et al.  WESSBAS: extraction of probabilistic workload specifications for load testing and performance prediction—a model-driven approach for session-based application systems , 2016, Software & Systems Modeling.

[8]  Alexander S. Rumyantsev,et al.  Accelerated verification of stability of simultaneous service multiserver systems , 2015, 2015 7th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT).

[9]  Nico M. van Dijk,et al.  Blocking of finite source inputs which require simultaneous servers with general think and holding times , 1989 .

[10]  Vaidyanathan Ramaswami,et al.  Introduction to Matrix Analytic Methods in Stochastic Modeling , 1999, ASA-SIAM Series on Statistics and Applied Mathematics.

[11]  Alexander S. Rumyantsev,et al.  A Regeneration-Based Estimation of High Performance Multiserver Systems , 2016, CN.

[12]  Guy Latouche,et al.  Semi-explicit solutions for M/PH/1-like queuing systems , 1983 .

[13]  James R. Larus,et al.  Software and the Concurrency Revolution , 2005, ACM Queue.

[14]  Marcel F. Neuts,et al.  Matrix-Geometric Solutions in Stochastic Models , 1981 .

[15]  Connie U. Smith,et al.  PMIF+: Extensions to Broaden the Scope of Supported Models , 2013, EPEW.

[16]  Samuel Kounev,et al.  Asking "What"?, Automating the "How"?: The Vision of Declarative Performance Engineering , 2016, ICPE.

[17]  Helen D. Karatza,et al.  Two-server parallel system with pure space sharing and Markovian arrivals , 2013, Comput. Oper. Res..

[18]  Oliver C. Ibe,et al.  Markov processes for stochastic modeling , 2008 .

[19]  Awi Federgruen,et al.  An M/G/c queue in which the number of servers required is random , 1984 .

[20]  Dror G. Feitelson,et al.  Workload Modeling for Computer Systems Performance Evaluation , 2015 .

[21]  Marcel F. Neuts,et al.  Local poissonification of the markovian arrival process , 1992 .

[22]  Wilhelm Hasselbring,et al.  Trace-Context Sensitive Performance Profiling for Enterprise Software Applications , 2008, SIPEW.

[23]  Alan Scheller-Wolf,et al.  Sink or swim together: necessary and sufficient conditions for finite moments of workload components in FIFO multiserver queues , 2011, Queueing Syst. Theory Appl..

[24]  Percy H. Brill,et al.  Queues in Which Customers Receive Simultaneous Service from a Random Number of Servers: A System Point Approach , 1984 .