A Rule-Based Verification and Control Framework in Atlas Trigger-DAQ

In order to meet the requirements of ATLAS experiment data taking, the Trigger-DAQ (TDAQ) system is composed of O(10000) of applications running on more than 2600 computers in a network. With such a system size, software and hardware failures are quite frequent. To minimize system downtime, the Trigger-DAQ control system shall include advance verification and diagnostics facilities. The operator shall use tests and expertise of the TDAQ and detectors developers in order to diagnose and recover from errors, if possible automatically. The TDAQ control system is built as a distributed tree of controllers, where the behavior of each controller is defined in a rule-based language allowing easy customization. The control system also includes a verification framework which allows users to develop and configure tests for any component in the system with different levels of complexity. It can be used as a stand-alone test facility for a small detector installation, as part of the general TDAQ initialization procedure, and for diagnosing problems which may occur during run time. The system is currently being used in TDAQ commissioning at the ATLAS experimental zone and by subdetectors for stand-alone verification of the detector hardware before it is finally installed.