Comprehending performance from real-world execution traces: a device-driver case

Real-world execution traces record performance problems that are likely perceived at deployment sites. However, those problems can be rooted subtly and deeply into system layers or other components far from the place where delays are initially observed. To tackle challenges of identifying deeply rooted problems, we propose a new trace-based approach consisting of two steps: impact analysis and causality analysis. The impact analysis measures performance impacts on a component basis, and the causality analysis discovers patterns of runtime behaviors that are likely to cause the measured impacts. The discovered patterns can help performance analysts quickly identify root causes of perceived performance problems. We instantiate our approach to study the performance of device drivers on over 19,500 real-world execution traces. The impact analysis shows that device drivers constitute a non-trivial part (≈ 38) in the overall system performance, and a big part (≈ 26) is due to interactions between drivers. The causality analysis effectively discovers highly suspicious and high-impact behavioral patterns in device drivers, examined and confirmed by our automated evaluation, developers, and performance analysts.

[1]  Asim Kadav,et al.  Understanding modern device drivers , 2012, ASPLOS XVII.

[2]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[3]  Erik R. Altman,et al.  Performance analysis of idle programs , 2010, OOPSLA.

[4]  Jianping Li,et al.  On the complexity of finding emerging patterns , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[5]  Galen C. Hunt,et al.  Solving the starting problem: device drivers as self-describing artifacts , 2006, EuroSys '06.

[6]  Fei Xie,et al.  An Automata-Theoretic Approach to Hardware/Software Co-verification , 2010, FASE.

[7]  Wei Jiang,et al.  Data Mining Methods and Applications , 2006 .

[8]  Jens Happe,et al.  Supporting swift reaction: Automatically uncovering performance problems by systematic experiments , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Silas Boyd-Wickizer,et al.  Tolerating Malicious Device Drivers in Linux , 2010, USENIX Annual Technical Conference.

[10]  Sriram K. Rajamani,et al.  Thorough static analysis of device drivers , 2006, EuroSys.

[11]  Leonid Ryzhyk,et al.  Dingo: taming device drivers , 2009, EuroSys '09.

[12]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[13]  Kavitha Srinivas,et al.  Summarizing application performance from a components perspective , 2005, ESEC/FSE-13.

[14]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[15]  Asim Kadav,et al.  Tolerating hardware device failures in software , 2009, SOSP '09.

[16]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[17]  Thomas W. Reps,et al.  Analyzing Stripped Device-Driver Executables , 2008, TACAS.

[18]  Xin Li,et al.  Reference-driven performance anomaly identification , 2009, SIGMETRICS '09.

[19]  Gernot Heiser,et al.  User-Level Device Drivers: Achieved Performance , 2005, Journal of Computer Science and Technology.

[20]  Jong-Deok Choi,et al.  Finding and Removing Performance Bottlenecks in Large Systems , 2004, ECOOP.

[21]  Leonid Ryzhyk,et al.  Automatic device driver synthesis with termite , 2009, SOSP '09.

[22]  Somesh Jha,et al.  The design and implementation of microdrivers , 2008, ASPLOS.

[23]  Leonid Ryzhyk,et al.  Improved device driver reliability through hardware verification reuse , 2011, ASPLOS XVI.

[24]  Brian N. Bershad,et al.  Recovering device drivers , 2004, TOCS.

[25]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  George Candea,et al.  Testing Closed-Source Binary Device Drivers with DDT , 2010, USENIX Annual Technical Conference.

[27]  J. Flinn,et al.  Automatic Root-cause Diagnosis of Performance Anomalies in Production Software , 2011 .

[28]  Matthias Hauswirth,et al.  Catch me if you can: performance bug detection in the wild , 2011, OOPSLA '11.

[29]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[30]  Jau-Hsiung Huang,et al.  On performance measurements of TCP/IP and its device driver , 1992, [1992] Proceedings 17th Conference on Local Computer Networks.

[31]  Willy Zwaenepoel,et al.  TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers , 2009, ASPLOS.

[32]  Chen Fu,et al.  Automatically finding performance problems with feedback-directed learning software testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Nathan R. Tallent,et al.  Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.

[34]  Somesh Jha,et al.  Microdrivers: A New Architecture for Device Drivers , 2007, HotOS.

[35]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[36]  Nathan R. Tallent,et al.  Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.

[37]  Leonid Ryzhyk,et al.  The case for active device drivers , 2010, APSys '10.