Generating Robust Parallel Programs via Model Driven Prediction of Compiler Optimizations for Non-determinism

Execution orders in parallel programs are governed by non-determinism and can vary substantially across different executions even on the same input. Thus, a highly non-deterministic program can exhibit rare execution orders never observed during testing. It is desirable to reduce non-determinism to suppress corner case behavior in production cycle (making the execution robust or bug-free) and increase non-determinism for reproducing bugs in the development cycle. Performance-wise different optimization levels (e.g. from O0 to O3) are enabled during development , however, non-determinism-wise, developers have no way to select right compiler optimization level in order to increase non-determinism for debugging or to decrease it for robustness. The major source of non-determinism is the underlying execution model, primarily determined by the processor architecture and the operating system (OS). Architectural artifacts such as cache misses and TLB misses characterize and shape the non-determinism. In this work, we seek to capture such sources of non-determinism through an architectural model based on hardware performance counters and use the model for predicting the appropriate compiler optimization level for generating a robust parallel program, which has minimal non-determinism in production. As a side effect, the generated model also allows maximizing non-determinism for debugging purposes. We demonstrate our technique on the PARSEC benchmark suite, and among other results show that the generated robust program decreases non-deterministic behavior up to 66.48%, and as a practical measure we also show that a known race condition plus randomly injected ones are rendered benign in the robust parallel program generated by our framework.

[1]  Robert K. Cunningham,et al.  The Real Cost of Software Errors , 2009, IEEE Security & Privacy.

[2]  Satish Narayanasamy,et al.  A case for an interleaving constrained shared-memory multi-processor , 2009, ISCA '09.

[3]  Satish Narayanasamy,et al.  DoublePlay: parallelizing sequential logging and replay , 2011, ASPLOS XVI.

[4]  David A. Wood,et al.  Calvin: Deterministic or not? Free will to choose , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Josep Torrellas,et al.  DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently , 2008, 2008 International Symposium on Computer Architecture.

[6]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[7]  Pradipta De,et al.  Handling OS jitter on multicore multithreaded systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Tomas Kalibera,et al.  Reducing performance non-determinism via cache-aware page allocation strategies , 2010, WOSP/SIPEW '10.

[9]  Yuanyuan Zhou,et al.  Triage: diagnosing production run failures at the user's site , 2007, SOSP.

[10]  Collin McCurdy,et al.  Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[11]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  Charles E. Leiserson,et al.  Detecting data races in Cilk programs that use locks , 1998, SPAA '98.

[13]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[14]  Vivek Sarkar,et al.  Determining average program execution times and their variance , 1989, PLDI '89.

[15]  Dan Grossman,et al.  RCDC: a relaxed consistency deterministic computer , 2011, ASPLOS XVI.

[16]  Manu Sridharan,et al.  PSE: explaining program failures via postmortem static analysis , 2004, SIGSOFT '04/FSE-12.

[17]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[18]  Ali-Reza Adl-Tabatabai,et al.  CoreRacer: A practical memory race recorder for multicore x86 TSO processors , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Petr Tuma,et al.  Benchmark Precision and Random Initial State , 2005 .

[20]  Ada Gavrilovska,et al.  DeSTM: Harnessing determinism in STMs for application development , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[21]  Laxmi N. Bhuyan,et al.  Thread Tranquilizer: Dynamically reducing performance variation , 2012, TACO.

[22]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[23]  William T. C. Kramer,et al.  Performance Variability of Highly Parallel Architectures , 2003, International Conference on Computational Science.

[24]  Brandon Lucia,et al.  DMP: Deterministic Shared-Memory Multiprocessing , 2010, IEEE Micro.

[25]  George Candea,et al.  Failure sketching: a technique for automated root cause diagnosis of in-production failures , 2015, SOSP.

[26]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[27]  Ion Stoica,et al.  ODR: output-deterministic replay for multicore debugging , 2009, SOSP '09.

[28]  Ada Gavrilovska,et al.  Quantifying and Reducing Execution Variance in STM via Model Driven Commit Optimization , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[29]  Guy E. Blelloch,et al.  Internally deterministic parallel algorithms can be fast , 2012, PPoPP '12.

[30]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[31]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[32]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[33]  Adam Welc,et al.  Safe nondeterminism in a deterministic-by-default parallel language , 2011, POPL '11.

[34]  Derek Hower,et al.  Rerun: Exploiting Episodes for Lightweight Memory Race Recording , 2008, 2008 International Symposium on Computer Architecture.

[35]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.

[36]  Nachiappan Nagappan,et al.  Concurrency at Microsoft – An Exploratory Survey , 2008 .

[37]  S AdveVikram,et al.  Safe nondeterminism in a deterministic-by-default parallel language , 2011 .