Automatic detection of concurrency bugs through event ordering constraints

Writing correct parallel software for modern multiprocessor systems-on-chip (MPSoCs) is a complicated task. Programmers can rarely anticipate all possible external and internal interactions in complex concurrent systems. Concurrency bugs originating from races and improper synchronization are difficult to understand and reproduce. Furthermore, traditional debug and verification practices for embedded systems lack support to address this issue efficiently. For instance, programmers still need to step through several executions until finding a buggy state or analyze complex traces, which results in productivity losses. This paper proposes a new debug approach for MPSoCs that combines dynamic analysis and the benefits of virtual platforms. All in all, it (i) enables automatic exploration of SW behavior, (ii) identifies problematic concurrent interactions, (iii) provokes possibly erroneous executions and, ultimately, (iv) detects concurrency bugs. The approach is demonstrated on an industrial-strength virtual platform with a full Linux operating system and real-world parallel benchmarks.

[1]  Jim Euchner Design , 2014, Catalysis from A to Z.

[2]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[3]  H. Pashler Dual-task interference in simple tasks: data and theory. , 1994, Psychological bulletin.

[4]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[5]  Colin J. Fidge,et al.  Timestamps in Message-Passing Systems That Preserve the Partial Ordering , 1988 .

[6]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[7]  G. Ascheid,et al.  Scalable and retargetable debugger architecture for heterogeneous MPSoCs , 2012, Proceedings of the 2012 System, Software, SoC and Silicon Debug Conference.

[8]  Gregg Rothermel,et al.  SimTester: a controllable and observable testing framework for embedded systems , 2012, VEE '12.

[9]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[10]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[11]  Koushik Sen,et al.  CalFuzzer: An Extensible Active Testing Framework for Concurrent Programs , 2009, CAV.

[12]  Jong-Deok Choi,et al.  Isolating failure-inducing thread schedules , 2002, ISSTA '02.

[13]  Rainer Leupers,et al.  Synchronization for hybrid MPSoC full-system simulation , 2012, DAC Design Automation Conference 2012.

[14]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[15]  George Candea,et al.  Data races vs. data race bugs: telling the difference with portend , 2012, ASPLOS XVII.

[16]  Rahul Agarwal,et al.  Run-time detection of potential deadlocks for programs with locks, semaphores, and condition variables , 2006, PADTAD '06.

[17]  Duncan A. Buell,et al.  Splash 2 , 1992, SPAA.