Adaptive network/service fault detection in transaction-oriented wide area networks

Algorithms and online software for automated and adaptive detection of network/service anomalies have been developed and field-tested for transaction-oriented wide area networks (WAN). These transaction networks are integral parts of electronic commerce infrastructures. Our adaptive network/service anomaly detection algorithms are demonstrated in a commercially important production WAN, currently monitored by our recently implemented real-time software system, TRISTAN (transaction instantaneous anomaly notification). TRISTAN adaptively and proactively detects network/service performance degradations and failures in multiple service-class transaction-oriented networks, where performances of service classes are mutually dependent and correlated, and where external or environmental factors can strongly impact network and service performances. In this paper, we present the architecture, summarize the implemented algorithms, and describe the operation of TRISTAN as deployed in the AT&T transaction access services (TAS) network. TAS is a commercially important, high volume, multiple service classes, hybrid telecommunication and data WAN that services transaction traffic in the USA and neighboring countries. It is demonstrated that TRISTAN detects network/service anomalies in TAS effectively. TRISTAN can automatically and dynamically detect network/service faults, which can easily elude detection by the traditional alarm-based network monitoring systems.