SPIN model checking: an introduction

A long-standing and elusive problem in software engineering is to devise reliable means that would allow us to check the correctness of distributed systems code mechanically. Writing reliable distributed code is notoriously difficult; locating the inevitable bugs in such code is therefore important. As is well-known and often repeated, traditional testing methods are of little use in this context, because the most pernicious bugs typically depend on subtle race conditions that produce peculiar and unexpected interleavings of events. Well-known are the deadlock and starvation problems that plagued the designers of the first distributed systems code in the 1960s and 1970s (see for instance [16] p.155). In simple cases, a small set of strictly enforced rules can preserve the sanity in a system. One such rule is the requirement that frequently used resources in an operating system can only be allocated in a fixed order, to prevent circular waiting. But the simple rules only cover the known problems. Each new system builds a new context, with its own peculiarities and hazards. This is illustrated by the well-publicized description of the hangup problem in the control software of the Mars Pathfinder a few years ago [11]. In retrospect, the hangup scenario could be understood in very simple terms, yet it was missed even in the long and unusually thorough (but traditional) software testing process that had been used.