Facing up to Faults

After the war, Turing re-appeared on the public scene, so to speak, and was instrumental in initiating the work at the National Physical Laboratory (NPL) on electronic computers. This provides me another point of contact with him, since - aside from a brief flirtation with the IBM 650, a boringly easy computer to program - my initial years as a programmer were spent trying to cope with the English Electric DEUCE computer. This computer was a direct descendant of the machine that Turing designed at NPL in the early years after the war. Thanks to Turing’s design, DEUCE was typically much faster in operation than its rivals, albeit almost entirely at the expense of its programmers. Such was the innocence of youth that I and my colleagues actually enjoyed its intricacies, and the problem of finding ways of automating, at least partially, the programming task. Indeed, we felt that contemporary American computer developments, by IBM and others, such as the provision of what seemed to us to be huge memories, and of floating point arithmetic hardware, were in effect cheating. Certainly they were depriving compiler writers such as ourselves of interesting and (we thought) worthwhile challenges.

[1]  Cliff B. Jones,et al.  An Early Program Proof by Alan Turing , 1984, Annals of the History of Computing.

[2]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[3]  Santosh K. Shrivastava,et al.  Checked transactions in an asynchronous message passing environment , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[4]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[5]  Ross J. Anderson How to cheat at the lottery (or, massively parallel requirements engineering) , 1999, Proceedings 15th Annual Computer Security Applications Conference (ACSAC'99).

[6]  Jean-Claude Laprie,et al.  Dependability of Software-Based Critical Systems , 2000 .

[7]  Peter G. Neumann,et al.  Computer-related risks , 1994 .

[8]  Avelino Francisco Zorzo,et al.  Rigorous development of a safety-critical system based on coordinated atomic actions , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[9]  B. Randell,et al.  Using Coordinated Atomic Actions to Design Complex Safety-critical Systems: the Production Cell Case Study , 1997 .

[10]  A. M. Turing,et al.  Checking a large routine , 1989 .

[11]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[12]  Brian Randell,et al.  Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[13]  Charles T. Davies,et al.  Data Processing Spheres of Control , 1978, IBM Syst. J..

[14]  Herman H. Goldstine,et al.  Planning and coding of problems for an Electronic Computing Instrument , 1947 .

[15]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[16]  Jie Xu,et al.  Exception handling in object-oriented real-time distributed systems , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[17]  Brian Randell,et al.  The origins of digital computers: Selected papers , 1975 .

[18]  Brian Randell,et al.  Software engineering : report on a conference sponsored by the NATO Science Committee, Garmisch, Germany, 7th to 11th October 1968 , 1969 .

[19]  Brian Randell,et al.  Iterative multi-level modelling. A methodology for computer system design , 1968, IFIP Congress.

[20]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[21]  David B. Lomet,et al.  Process structuring, synchronization, and recovery using atomic actions , 1977, Language Design for Reliable Software.

[22]  Jie Xu,et al.  Coordinated exception handling in distributed object systems: from model to system implementation , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[23]  Jean-Claude Laprie,et al.  Dependable computing: concepts, limits, challenges , 1995 .

[24]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[25]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[26]  Brian Randell,et al.  Developing Control Software for Production Cell II: Failure Analysis and System Design Using CA Actions , 1998 .