Low Overhead Hardware-Assisted Virtual Machine Analysis and Profiling

Cloud infrastructure providers need reliable performance analysis tools for their nodes. Moreover, the analysis of Virtual Machines (VMs) is a major requirement in quantifying cloud performance. However, root cause analysis, in case of unexpected crashes or anomalous behavior in VMs, remains a major challenge. Modern tracing tools such as LTTng allow fine grained analysis - albeit at a minimal execution overhead, and being OS dependent. In this paper, we propose HAVAna, a hardware-assisted VM analysis algorithm that gathers and analyzes pure hardware trace data, without any dependence on the underlying OS or performance analysis infrastructure. Our approach is totally non-intrusive and does not require any performance statistics, trace or log gathering from the VM. We used the recently introduced Intel PT ISA extensions on modern Intel Skylake processors to demonstrate its efficiency and observed that, in our experimental scenarios, it leads to a tiny overhead of up to 1%, as compared to 3.6-28.7% for similar VM trace analysis done with software-only schemes such as LTTng. Our proposed VM trace analysis algorithm has also been open-sourced for further enhancements and to the benefit of other developers. Furthermore, we developed interactive Resource and Process Control Flow visualization tools to analyze the hardware trace data and present a real-life usecase in the paper that allowed us to see unexpected resource consumption by VMs.

[1]  Michel Dagenais,et al.  Fine-grained preemption analysis for latency investigation across virtual machines , 2014, Journal of Cloud Computing.

[2]  Xiaohui Gu,et al.  Ieee Transactions on Parallel and Distributed Systems (tpds) Perfcompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-service Clouds , 2022 .

[3]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Y. N. Srikant,et al.  A programmable hardware path profiler , 2005, International Symposium on Code Generation and Optimization.

[5]  Christof Fetzer,et al.  INSPECTOR: Data Provenance Using Intel Processor Trace (PT) , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[6]  Adrian Perrig,et al.  XTRec: Secure Real-Time Execution Trace Recording on Commodity Platforms , 2011, 2011 44th Hawaii International Conference on System Sciences.

[7]  M. Desnoyers,et al.  The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux , 2006 .

[8]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[9]  Antonio Pescapè,et al.  Cloud monitoring: A survey , 2013, Comput. Networks.

[10]  Michel Dagenais,et al.  Virtual CPU State Detection and Execution Flow Analysis by Host Tracing , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[11]  S. K. Nandy,et al.  Resource usage monitoring for KVM based virtual machines , 2012, 2012 18th International Conference on Advanced Computing and Communications (ADCOM).

[12]  Michel Dagenais,et al.  A Flexible Data-Driven Approach for Execution Trace Filtering , 2015, 2015 IEEE International Congress on Big Data.

[13]  Mark Mitchell,et al.  Feedback-Directed Optimizations in GCC with Estimated Edge Profiles from Hardware Event Sampling , 2008 .

[14]  Tao Huang,et al.  VMon: Monitoring and Quantifying Virtual Machine Interference via Hardware Performance Counter , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[15]  Michel Dagenais,et al.  Hardware-assisted instruction profiling and latency detection , 2016 .