Deterministic replay for message-passing-based concurrent programs

The Multicore Communications API (MCAPI) is a new message-passing API that was released by the Multicore Association. MCAPI provides an interface designed for closely distributed embedded systems with multiple cores on a chip and/or chips on a board. Similar to parallel programs in other domains, debugging MCAPI programs is a challenging task due to their nondeterministic behavior. In this article we present a tool that is capable of deterministically replaying MCAPI program executions, which provides valuable insight for MCAPI developers in case of failure.

[1]  Martin Schulz,et al.  Scalable compression and replay of communication traces in massively parallel environments , 2006, SC.

[2]  Koen De Bosschere,et al.  Cyclic Debugging Using Execution Replay , 2001, International Conference on Computational Science.

[3]  David F. Snelling,et al.  A comparative study of libraries for parallel processing , 1988, Parallel Comput..

[4]  Satish Narayanasamy,et al.  Software Profiling for Deterministic Replay Debugging of User Code , 2006, SoMeT.

[5]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[6]  Ross N. Williams A painless Guide to CRC Error Detection Algorithms , 1993 .

[7]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[8]  Daniel J. Quinlan ROSE: Compiler Support for Object-Oriented Frameworks , 2000, Parallel Process. Lett..

[9]  D. Qainlant,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .

[10]  German Shegalov,et al.  Integrated data, message, and process recovery for failure masking in web services , 2005 .

[11]  Hiroshi Nakashima,et al.  Parallel Program Debugging based on Data-Replay , 2005, IASTED PDCS.

[12]  Wenguang Chen,et al.  MPIWiz: subgroup reproducible replay of mpi applications , 2009, PPoPP '09.

[13]  Jason Gait,et al.  A probe effect in concurrent programs , 1986, Softw. Pract. Exp..

[14]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[15]  Sriram Krishnamoorthy,et al.  Scalable Communication Trace Compression , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[16]  Peter M. Chen,et al.  ExtraVirt: detecting and recovering from transient processor faults , 2005, SOSP '05.

[17]  Christine Morin,et al.  Transparent Message-Passing Parallel Applications Checkpointing in Kerrighed , 2005 .

[18]  Dieter Kranzlmüller,et al.  An Integrated Record&Replay Mechanism for Nondeterministic Message Passing Programs , 2001, PVM/MPI.

[19]  Kai Li,et al.  CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[20]  Hector Garcia-Molina,et al.  Elections in a Distributed Computing System , 1982, IEEE Transactions on Computers.

[21]  Patrice Godefroid,et al.  Dynamic partial-order reduction for model checking software , 2005, POPL '05.

[22]  Zijiang Yang,et al.  Deterministic replay for MCAPI programs , 2011, PADTAD '11.

[23]  Robert H. B. Netzer,et al.  Debugging race conditions in message-passing programs , 1996, SPDT '96.

[24]  Nicholas Nethercote,et al.  "Building Workload Characterization Tools with Valgrind" , 2006, 2006 IEEE International Symposium on Workload Characterization.

[25]  Ganesh Gopalakrishnan,et al.  MCC: A runtime verification tool for MCAPI user applications , 2009, 2009 Formal Methods in Computer-Aided Design.

[26]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .