Fundamentals of Distributed System Observation

It's difficult to determine event order in distributed systems because of the observability problem. The author discusses this problem and evaluates different strategies for determining arrival order. The author analyzed four time stamping methods to determine their effectiveness in contending with observability problems. Although he focuses on distributed systems, the concepts also apply to any system exhibiting concurrency-the appearance of two or more events occurring simultaneously-including multiprocessor machines and uniprocessor multitasking. Events in this context may be the execution of single machine instructions or entire procedures; the level of granularity is unimportant. To define event order, the author uses the idea of causality-the ability of one event to affect another-because it allows us to reason independent of any particular time frame.

[1]  Colin J. Fidge,et al.  Logical time in distributed computing systems , 1991, Computer.

[2]  Kenneth P. Birman,et al.  The process group approach to reliable distributed computing , 1992, CACM.

[3]  Charles E. McDowell,et al.  Debugging concurrent programs , 1989, ACM Comput. Surv..

[4]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[5]  Irène Guessarian,et al.  Semantics of systems of concurrent processes : LITP Spring School on Theoretical Computer Science, La Roche Posay, France, April 23-27, 1990, proceedings , 1990 .

[6]  Larry D. Wittie Debugging distributed C programs by real time reply , 1988, PADD '88.

[7]  Mukesh Singhal,et al.  Logical Time: Capturing Causality in Distributed Systems , 1996, Computer.

[8]  David R. Cheriton,et al.  Understanding the limitations of causally and totally ordered communication , 1994, SOSP '93.

[9]  Aaron Jonathan Gordon Ordering errors in distributed programs (communication, debug, operating system, language) , 1985 .

[10]  F. Baiardi,et al.  Development of a debugger for a concurrent language , 1986, IEEE Transactions on Software Engineering.

[11]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[12]  Dieter Haban DTM - A Method for Testing Distributed Systems , 1987, SRDS.

[13]  Aaron Gordon Ordering Errors in Distributed Programs , 1985 .

[14]  Chinya V. Ravishankar,et al.  Monitoring and debugging distributed realtime programs , 1992, Softw. Pract. Exp..

[15]  Bernadette Charron-Bost Concerning the Size of Clocks , 1990, Semantics of Systems of Concurrent Processes.