Jitter-Trace: a low-overhead OS noise tracing tool based on Linux Perf

Operating System (OS) noise is a well-known phenomenon in which OS activities interfere with the execution of large-scale parallel applications. Due to OS noise, feature-rich software environments such as Linux can seriously affect scalability. Kernel tracing can be used to identify OS noise sources, but until recently it required substantial OS modifications. This paper presents Jitter-Trace, a low-overhead tool that identifies and quantifies jitter sources. Jitter-Trace calculates the jitter generated by each OS activity, providing a complete set of task profiles and histograms of OS noise. This data is essential to implement OS noise mitigation strategies and reduce its impact on scalability. Jitter-Trace leverages the tracing and profiling capabilities of Linux Perf, which is widely available in current Linux distributions. Perf is tightly integrated in the Linux kernel and features a lightweight implementation.

[1]  Mateo Valero,et al.  Evaluating the Impact of TLB Misses on Future HPC Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[2]  Michel Dagenais,et al.  Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[3]  Torsten Hoefler,et al.  On noise and the performance benefit of nonblocking collectives , 2016, Int. J. High Perform. Comput. Appl..

[4]  Yoonho Park,et al.  FusedOS: Fusing LWK Performance with FWK Functionality in a Heterogeneous Environment , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[5]  Hermann Härtig,et al.  Decoupled: Low-Effort Noise-Free Execution on Commodity Systems , 2016, ROSS@HPDC.

[6]  Sameer Kumar,et al.  Evaluating the effect of replacing CNK with linux on the compute-nodes of blue gene/l , 2008, ICS '08.

[7]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Susan Coghlan,et al.  The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.

[9]  M. Desnoyers LTTng: Tracing across execution layers, from the Hypervisor to user-space , 2008 .

[10]  William Henderson,et al.  Improving the Accuracy of Scheduling Analysis Applied to Distributed Systems Computing Minimal Response Times and Reducing Jitter , 2004, Real-Time Systems.

[11]  Dan Tsafrir,et al.  System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.

[12]  R. Krishnakumar Kernel korner: kprobes-a kernel debugger , 2005 .

[13]  Francisco J. Cazorla,et al.  A Quantitative Analysis of OS Noise , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[14]  Ravi Kothari,et al.  Identifying sources of Operating System Jitter through fine-grained kernel instrumentation , 2007, 2007 IEEE International Conference on Cluster Computing.

[15]  Suzanne M. Kelly,et al.  Software Architecture of the Light Weight Kernel, Catamount , 2005 .

[16]  Pradipta De,et al.  Impact of Noise on Scaling of Collectives: An Empirical Evaluation , 2006, HiPC.

[17]  Ronald Minnich,et al.  Analysis of microbenchmarks for performance tuning of clusters , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[18]  Mateo Valero,et al.  Designing OS for HPC Applications: Scheduling , 2010, 2010 IEEE International Conference on Cluster Computing.

[19]  Richard J. Moore A Universal Dynamic Trace for Linux and Other Operating Systems , 2001, USENIX Annual Technical Conference, FREENIX Track.

[20]  Brendan Gregg,et al.  Solaris(TM) Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (Solaris Series) , 2006 .

[21]  Allen D. Malony,et al.  The ghost in the machine: observing the effects of kernel operation on parallel application performance , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[22]  Asser N. Tantawi,et al.  Extreme scale computing: Modeling the impact of system noise in multicore clustered systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[23]  R. Gioiosa,et al.  Analysis of system overhead on parallel computers , 2004, Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004..

[24]  Lorie M. Liebrock,et al.  Stepping towards noiseless Linux environment , 2012, ROSS '12.

[25]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).