Unified debugging of distributed systems with Recon

To scale to today's complex distributed software systems, debugging and replaying techniques mostly focus on single facets of software, e.g., local concurrency, distributed messaging, or data representation. This forces developers to tediously combine different technologies such as instruction-level dynamic tracing, event log analysis, or global state reconstruction to gradually explain non-trivial defects.

[1]  Amin Vahdat,et al.  Life, death, and the critical transition: finding liveness bugs in systems code , 2007 .

[2]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[3]  Xuezheng Liu,et al.  D3S: Debugging Deployed Distributed Systems , 2008, NSDI.

[4]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[5]  Amin Vahdat,et al.  Pip: Detecting the Unexpected in Distributed Systems , 2006, NSDI.

[6]  Yuanyuan Zhou,et al.  PRES: probabilistic replay with execution sketching on multiprocessors , 2009, SOSP '09.

[7]  Wei Lin,et al.  WiDS Checker: Combating Bugs in Distributed Systems , 2007, NSDI.

[8]  Dawson R. Engler,et al.  Model Checking Large Network Protocol Implementations , 2004, NSDI.

[9]  Charles Edwin Killian,et al.  Systems and language support for building correct, high performance distributed systems , 2008 .

[10]  Richard Mortier,et al.  Magpie: Online Modelling and Performance-aware Systems , 2003, HotOS.

[11]  Benjamin Livshits,et al.  Finding application errors and security flaws using PQL: a program query language , 2005, OOPSLA '05.

[12]  Alexander Aiken,et al.  Relational queries over program traces , 2005, OOPSLA '05.

[13]  Yasushi Saito,et al.  Jockey: a user-space library for record-replay debugging , 2005, AADEBUG'05.

[14]  David R. Karger,et al.  INS/Twine: A Scalable Peer-to-Peer Architecture for Intentional Resource Discovery , 2002, Pervasive.

[15]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[16]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[17]  Atul Singh,et al.  Using queries for distributed monitoring and forensics , 2006, EuroSys.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[20]  Ion Stoica,et al.  Implementing declarative overlays , 2005, SOSP '05.

[21]  Viktor Kuncak,et al.  CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems , 2009, NSDI.

[22]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[23]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[24]  Éric Tanter,et al.  Processing , 1988 .

[25]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.

[26]  Scott Shenker,et al.  Replay debugging for distributed applications , 2006 .

[27]  Ion Stoica,et al.  Friday: Global Comprehension for Distributed Replay , 2007, NSDI.

[28]  Xiangyu Zhang,et al.  Efficient online detection of dynamic control dependence , 2007, ISSTA '07.

[29]  Patrick Th. Eugster,et al.  Lightweight Task Graph Inference for Distributed Applications , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.