Latent design faults in the development of the Multiflow TRACE/200

Several examples of design faults that appeared during the development of the Multiflow TRACE/200 series of minisupercomputers are discussed. The design flaws generally fell into a few categories: interface mis-assumptions, instruction cache, parity-related, designer errors, CAD tools, and defective part designs (especially ground-bounce). Examples of bugs in each category are given. Random diagnostics were particularly helpful in detecting several fault classes. The authors conclude with a classification of the severity and time history of the bug categories. >

[1]  R.P. Colwell Latent design faults in the development of Multiflow's TRACE/200 , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[2]  J-C. Laprie,et al.  DEPENDABLE COMPUTING AND FAULT TOLERANCE : CONCEPTS AND TERMINOLOGY , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[3]  William J. Dally,et al.  MDP design tools and methods , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[4]  Robert P. Colwell,et al.  Architecture and implementation of a VLIW supercomputer , 1990, Proceedings SUPERCOMPUTING '90.

[5]  John R. Feehrer,et al.  Memory System for a Statically Scheduled Supercomputer , 1991, ICPP.

[6]  Darren Jones,et al.  Verification techniques for a MIPS compatible embedded control processor , 1991, [1991 Proceedings] IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[7]  John Paul Shen,et al.  Exploiting instruction-level resource parallelism for transparent, integrated control-flow monitoring , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[8]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.