Diagnosing Latency in Multi-Tier Black-Box Services

As multi-tier cloud applications become pervasive, we need better tools for understanding their performance. This paper presents a system that analyzes observed or desired changes to end-to-end latency prole in a large distributed application, and identies their underlying causes. It recognizes changes to system conguration, workload, or performance of individual services that lead to the observed or desired outcome. Experiments on an industrial datacenter demonstrate the utility of the system.