SCOZ: A system‐wide causal profiler for multicore systems

The increased complexity of hardware and software makes it difficult to analyze programs with conventional profilers. The causal profiling technique is introduced to solve the problem of conventional profilers. The causal profiling technique finds the bottleneck of the program and shows the effect of optimizing it. COZ, the newest causal profiler, exploits a technique called virtual speedup to perform causal profiling without actually optimizing program codes. However, it can only profile multithreaded applications, and cannot profile multiprogram applications and operating system (OS) kernel codes, thereby limiting the use of causal profiling. This article introduces SCOZ, a system‐wide causal profiler that addresses these limitations. The proposed profiler changes the target of virtual speedup from threads to CPU cores, thereby expanding the profiling coverage to diverse applications as well as OS kernel codes. To verify our profiler, we profiled multithreaded and OS kernel‐intensive applications. For multithread applications, our profiler shows identical results to what COZ provides. For the OS kernel‐intensive applications, our profiler identifies identical bottlenecks that previous OS scalability studies have pinpointed. Finally, we verified the profiling capability of the proposed profiler by profiling and optimizing multiprocess applications in the NAS parallel benchmark suite.

[1]  Changwoo Min,et al.  Understanding Manycore Scalability of File Systems , 2016, USENIX Annual Technical Conference.

[2]  Barton P. Miller,et al.  IPS-2: The Second Generation of a Parallel Program Measurement System , 1990, IEEE Trans. Parallel Distributed Syst..

[3]  Arnaldo Carvalho de Melo,et al.  The New Linux ’ perf ’ Tools , 2010 .

[4]  Santosh Nagarakatte,et al.  A fast causal profiler for task parallel programs , 2017, ESEC/SIGSOFT FSE.

[5]  Johan A. Pouwelse,et al.  Understanding software performance regressions using differential flame graphs , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Coz: finding code that counts with causal profiling , 2015, SOSP.

[7]  Santosh Nagarakatte,et al.  Parallelism-centric what-if and differential analyses , 2019, PLDI.

[8]  Guangming Zeng,et al.  SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs , 2017, EuroSys.

[9]  Stephen A. Jarvis,et al.  Portable and architecture independent parallel performance tuning using a call-graph profiling tool , 1997, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.

[10]  Felix Wolf,et al.  Space-efficient time-series call-path profiling of parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Yang Wang,et al.  wPerf: Generic Off-CPU Analysis to Identify Bottleneck Waiting Events , 2018, OSDI.

[13]  B. Gregg The flame graph , 2016, Commun. ACM.

[14]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[15]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[16]  Robert Tappan Morris,et al.  An Analysis of Linux Scalability to Many Cores , 2010, OSDI.

[17]  Rajiv Gupta,et al.  DProf: distributed profiler with strong guarantees , 2019, Proc. ACM Program. Lang..

[18]  Aparna Chandramowlishwaran,et al.  What-If Analysis of Page Load Time in Web Browsers Using Causal Profiling , 2019, SIGMETRICS.

[19]  Tingting Yu,et al.  SyncProf: detecting, localizing, and optimizing synchronization bottlenecks , 2016, ISSTA.

[20]  Brendan Gregg,et al.  Dtrace: Dynamic Tracing in Oracle Solaris, Mac OS X and Freebsd , 2011 .

[21]  Nader Boushehrinejadmoradi,et al.  A Parallelism Profiler with What-If Analyses for OpenMP Programs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Barton P. Miller,et al.  IPS: An Interactive and Automatic Performance Measurement Tool for Parallel and Distributed Programs , 1987, ICDCS.

[23]  Mateo Valero,et al.  Designing OS for HPC Applications: Scheduling , 2010, 2010 IEEE International Conference on Cluster Computing.