Fault Tolerant Architectures - Past, Present, and (?) Future

It is argued that fault tolerance is a feature that not only is needed in the computer marketplace but that this need is in fact growing, this in spite of the fact that computer hardware has become orders of magnitude more reliable over the last four decades and that, at least by some accountings, most computer outages are due to factors (software bugs, operator errors) other than hardware problems. It is also argued that while techniques for detecting hardware, and to some extent software, faults are well understood, there is still much to be discovered with regard to recovering from detected faults without corrupting data or loosing program continuity.