Seeing Through Black Boxes : Tracking Transactions through Queues under Monitoring Resource Constraints

The problem of optimal allocation of monitoring resources for tracking transactions progressing through a distributed system, modeled as a queueing network, is considered. Two forms of monitoring information are considered, viz., locally unique transaction identifiers, and arrival and departure timestamps of transactions at each processing queue. The timestamps are assumed to be available at all the queues but in the absence of identifiers, only enable imprecise tracking since parallel processing can result in out-of-order departures. On the other hand, identifiers enable precise tracking but are not available without proper instrumentation. Given an instrumentation budget, only a subset of queues can be selected for the production of identifiers, while the remaining queues have to resort to imprecise tracking using timestamps. The goal is then to optimally allocate the instrumentation budget to maximize the overall tracking accuracy. The challenge is that the optimal allocation strategy depends on accuracies of timestamp-based tracking at different queues, which has complex dependencies on the arrival and service processes, and the queueing discipline. We propose two simple heuristics for allocation by predicting the order of timestamp-based tracking accuracies of different queues. We derive sufficient conditions for these heuristics to achieve optimality through the notion of the stochastic comparison of queues. Simulations show that our heuristics are close to optimality, even when the parameters deviate from these conditions.

[1]  Philip A. Bernstein,et al.  Principles of Transaction Processing , 1996 .

[2]  Julio César López-Hernández,et al.  Stardust: tracking activity in a distributed storage system , 2006, SIGMETRICS '06/Performance '06.

[3]  Chun Zhang,et al.  vPath: Precise Discovery of Request Processing Paths from Black-Box Observations of Thread and Network Activities , 2009, USENIX Annual Technical Conference.

[4]  Reinhold Kröger,et al.  A Generic Application-Oriented Performance Instrumentation for Multi-Tier Environments , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[5]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[6]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[7]  Andrea J. Borr Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing , 1981, VLDB.

[8]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[9]  J. Shanthikumar,et al.  Multivariate Stochastic Orders , 2007 .

[10]  Armando Fox,et al.  Pinpoint: problem determination in large , 2002 .

[11]  Ting He,et al.  Selectively retrofitting monitoring in distributed systems , 2009, PERV.

[12]  K. Mani Chandy,et al.  Open, Closed, and Mixed Networks of Queues with Different Classes of Customers , 1975, JACM.

[13]  A. Müller,et al.  Comparison Methods for Stochastic Models and Risks , 2002 .

[14]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[15]  Moshe Shaked,et al.  Stochastic orders and their applications , 1994 .

[16]  Rauf Izmailov,et al.  Real-time Application Monitoring and Diagnosis for Service Hosting Platforms of Black Boxes , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[17]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[18]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[19]  Anima Anandkumar,et al.  Tracking in a spaghetti bowl: monitoring transactions using footprints , 2008, SIGMETRICS '08.