A randomized scheduler with probabilistic guarantees of finding bugs

This paper presents a randomized scheduler for finding concurrency bugs. Like current stress-testing methods, it repeatedly runs a given test program with supplied inputs. However, it improves on stress-testing by finding buggy schedules more effectively and by quantifying the probability of missing concurrency bugs. Key to its design is the characterization of the depth of a concurrency bug as the minimum number of scheduling constraints required to find it. In a single run of a program with n threads and k steps, our scheduler detects a concurrency bug of depth d with probability at least 1/nkd-1. We hypothesize that in practice, many concurrency bugs (including well-known types such as ordering errors, atomicity violations, and deadlocks) have small bug-depths, and we confirm the efficiency of our schedule randomization by detecting previously unknown and known concurrency bugs in several production-scale concurrent programs.

[1]  Koushik Sen,et al.  Effective random testing of concurrent programs , 2007, ASE.

[2]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[3]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[4]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[5]  Sanjay Bhansali,et al.  Framework for instruction-level tracing and analysis of program executions , 2006, VEE '06.

[6]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[7]  Horatiu Jula,et al.  Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[8]  Madan Musuvathi,et al.  Iterative context bounding for systematic testing of multithreaded programs , 2007, PLDI '07.

[9]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[10]  Patrice Godefroid,et al.  Model checking for programming languages using VeriSoft , 1997, POPL '97.

[11]  Jong-Deok Choi,et al.  Isolating failure-inducing thread schedules , 2002, ISSTA '02.

[12]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[13]  Eitan Farchi,et al.  Concurrent bug patterns and how to test them , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[14]  Serdar Tasiran,et al.  Goldilocks: a race and transaction-aware java runtime , 2007, PLDI '07.

[15]  Brandon Lucia,et al.  DMP: deterministic shared memory multiprocessing , 2009, IEEE Micro.

[16]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[17]  Eitan Farchi,et al.  Framework for testing multi‐threaded Java programs , 2003, Concurr. Comput. Pract. Exp..

[18]  Yosi Ben-Asher,et al.  Producing scheduling that causes concurrent programs to fail , 2006, PADTAD '06.

[19]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[20]  Min Xu,et al.  A serializability violation detector for shared-memory server programs , 2005, PLDI '05.

[21]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[22]  Josep Torrellas,et al.  SigRace: signature-based data race detection , 2009, ISCA '09.

[23]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[24]  Koushik Sen,et al.  A randomized dynamic program analysis technique for detecting real deadlocks , 2009, PLDI '09.

[25]  Joseph L. Hellerstein,et al.  Achieving Service Rate Objectives with Decay Usage Scheduling , 1993, IEEE Trans. Software Eng..

[26]  Richard H. Carver,et al.  Replay and testing for concurrent programs , 1991, IEEE Software.

[27]  Satish Narayanasamy,et al.  A case for an interleaving constrained shared-memory multi-processor , 2009, ISCA '09.

[28]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[29]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.