Transparent logging as a technique for debugging complex distributed systems

As any battle-scarred veteran will testify, debugging a distributed system in production use is an enterprise fraught with great difficulty and frustration. By the time the system is released for production use, most of the easy bugs have been found and fixed. The remaining bugs are typically non-deterministic in nature, and will only manifest themselves under conditions of heavy use. Although rare, such bugs cannot be ignored because they often have serious consequences.In this position paper, we put forth the thesis that logging is a flexible, powerful, and convenient tool for debugging complex distributed systems. We substantiate this thesis in three steps. First, we argue that logging is particularly well suited for debugging distributed systems. Next, we observe that logging is already used in distributed systems for reasons independent of debugging. Finally, we show that the latter uses of logging can be transparently extended to support debugging.