A Performance Debugging Framework for Unnecessary Lock Contentions with Record/Replay Techniques

Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that, a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program execution on multicore processors, incurring significant performance overhead. This paper presents a performance debugging framework, PERFPLAY, to facilitate the identification of unnecessary lock contentions and to guide programmers to improve the program performance by eliminating the unnecessary lock contentions. Since the performance debugging of unnecessary lock contentions is input-sensitive, we first identify the representative inputs for performance debugging. Next, PERFPLAY quantifies the performance impact of unnecessary lock contention code regions for each candidate input. Taking into account conflicting attribute of performance impact and input coverage in the real world, we finally make the tradeoff between performance impact and input coverage to recommend the optimal unnecessary lock contention code regions. Our final results on five real-world programs and PARSEC benchmarks demonstrate the significant performance overhead of unnecessary lock contentions, and the effectiveness of PERFPLAY in troubleshooting the target unnecessary lock contention code regions with the consideration of both performance impact and input coverage.

[1]  James R. Goodman,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.

[2]  Nicholas Nethercote,et al.  How to shadow every byte of memory used by a program , 2007, VEE '07.

[3]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[4]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[6]  Amitabha Roy,et al.  A runtime system for software lock elision , 2009, EuroSys '09.

[7]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[8]  Hai Jin,et al.  On performance debugging of unnecessary lock contentions on multicore processors: A replay-based approach , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[9]  Tao Xie,et al.  Automatic test generation for mutation testing on database applications , 2013, 2013 8th International Workshop on Automation of Software Test (AST).

[10]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[11]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multithreaded programs , 1997, TOCS.

[12]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[13]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[14]  Josep Torrellas,et al.  Dynamically detecting and tolerating IF-Condition Data Races , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Jonathan L. Gross,et al.  Topological Graph Theory , 1987, Handbook of Graph Theory.

[16]  Maged M. Michael,et al.  Robust architectural support for transactional memory in the power architecture , 2013, ISCA.

[17]  Yehuda Afek,et al.  Software-improved hardware lock elision , 2014, PODC '14.

[18]  James R. Goodman,et al.  Transactional lock-free execution of lock-based programs , 2002, ASPLOS X.

[19]  Satish Narayanasamy,et al.  Automatically classifying benign and harmful data races using replay analysis , 2007, PLDI '07.

[20]  Sarfraz Khurshid,et al.  Test input generation with java PathFinder , 2004, ISSTA '04.

[21]  Dave Towey,et al.  A revisit of three studies related to random testing , 2015, Science China Information Sciences.

[22]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[23]  Sebastian Burckhardt,et al.  Effective Data-Race Detection for the Kernel , 2010, OSDI.

[24]  Jason Nieh,et al.  Transparent mutable replay for multicore debugging and patch validation , 2013, ASPLOS '13.

[25]  Tao Xie,et al.  Generating program inputs for database application testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[26]  Viktor Leis,et al.  Exploiting hardware transactional memory in main-memory databases , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[27]  Sarfraz Khurshid,et al.  Symbolic execution for software testing in practice: preliminary assessment , 2011, 2011 33rd International Conference on Software Engineering (ICSE).