Lessons from the PSTN for Dependable Computing

The Public Switched Telephone Network (PSTN) is a large, complex, distributed system with strong dependability guarantees. As users come to expect greater dependability from computer systems, looking at similar systems like the PSTN can provide valuable insight into failure modes and dependability techniques. In this paper, we present the failure reporting methodology used by the PSTN, consider different metrics for reporting availability, and discover that human error is the most significant cause of PSTN unavailability. Designers of computer systems can learn from the failure data to gain quantitative information to predict why computer systems fail. More importantly, they can learn from seeing how the PSTN failure data is collected and how to measure the impact of failures. General Terms: Measurement, Documentation, Reliability, Human Factors