The application of knowledge-based techniques to the monitoring of computers in a large heterogeneous distributed environment

Historically, computer monitoring tools sent alarms to a central point, typically a computer operator. This has been successful in environments with a small number of processors. However, future CERN systems will contain thousands of heterogeneous distributed processors and it may prove impossible for an operator to cope. Consequently, monitoring of the proposed environment has become a research focus. One solution to this problem is to provide monitoring knowledge that correlates alarms and provides fewer, but more meaningful advisory information. Moreover, such a solution could reduce operator load further by taking automatic corrective action.