Towards Occupation Inference in Non-instrumented Services

Measuring the capacity and modeling the response to load of a real distributed system and its components requires painstaking instrumentation. Even though it greatly improves observability, instrumentation may not be desirable, due to cost, or possible due to legacy constraints. To model how a component responds to load and estimate its maximum capacity, and in turn act in time to preserve quality of service, we need a way to measure component occupation. Hence, recovering the occupation of internal non-instrumented components is extremely useful for system operators, as they need to ensure responsiveness of each one of these components and ways to plan resource provisioning. Unfortunately, complex systems will often exhibit non-linear responses that resist any simple closed-form decomposition. To achieve this decomposition in small subsets of non-instrumented components, we propose training a neural network that computes their respective occupations. We consider a subsystem comprised of two simple sequential components and resort to simulation, to evaluate the neural network against an optimal baseline solution. Results show that our approach can indeed infer the occupation of the layers with high accuracy, thus showing that the sampled distribution preserves enough information about the components. Hence, neural networks can improve the observability of online distributed systems in parts that lack instrumentation.

[1]  Paramvir Bahl,et al.  Towards highly reliable enterprise network services via inference of multi-level dependencies , 2007, SIGCOMM.

[2]  Li-Chun Wang,et al.  A queueing analytical model for service mashup in mobile cloud computing , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[3]  Jerome A. Rolia,et al.  Web Server Performance Measurement and Modeling Techniques , 1998, Performance evaluation (Print).

[4]  Maria Kihl,et al.  Web server performance modeling using an M/G/1/K*PS queue , 2003, 10th International Conference on Telecommunications, 2003. ICT 2003..

[5]  Ashraf A. Shahin Enhancing Elasticity of SaaS Applications using Queuing Theory , 2017, ArXiv.

[6]  R. Kálmán On the general theory of control systems , 1959 .

[7]  Wolfgang Barth,et al.  Nagios: System and Network Monitoring , 2006 .

[8]  Rodrigo Fonseca,et al.  Principled workflow-centric tracing of distributed systems , 2016, SoCC.

[9]  Robert D. van der Mei,et al.  Web Server Performance Modeling , 2001, Telecommun. Syst..

[10]  Haifeng Li A Queue Theory Based Response Time Model for Web Services Chain , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[11]  Asser N. Tantawi,et al.  An analytical model for multi-tier internet services and its applications , 2005, SIGMETRICS '05.

[12]  Filipe Araújo,et al.  Nonintrusive Monitoring of Microservice-Based Systems , 2018, 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA).

[13]  Kerrie Mengersen,et al.  Computationally Efficient Simulation of Queues: The R Package queuecomputer , 2017, J. Stat. Softw..

[14]  Claus Pahl,et al.  Performance Engineering for Microservices: Research Challenges and Directions , 2017, ICPE Companion.

[15]  김종영 구글 TensorFlow 소개 , 2015 .

[16]  Filipe Araújo,et al.  On Black-Box Monitoring Techniques for Multi-Component Services , 2018, 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA).

[17]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[18]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[19]  Chun Zhang,et al.  vPath: Precise Discovery of Request Processing Paths from Black-Box Observations of Thread and Network Activities , 2009, USENIX Annual Technical Conference.

[20]  Rainer Weinreich,et al.  Decision Guidance Models for Microservice Monitoring , 2017, 2017 IEEE International Conference on Software Architecture Workshops (ICSAW).

[21]  Miklós Telek,et al.  PhFit: A General Phase-Type Fitting Tool , 2002, Computer Performance Evaluation / TOOLS.

[22]  Rui Pedro Paiva,et al.  Client-side black-box monitoring for web sites , 2017, 2017 IEEE 16th International Symposium on Network Computing and Applications (NCA).