High-performance scalable information service for the ATLAS experiment

The ATLAS experiment is being operated by highly distributed computing system which is constantly producing a lot of status information which is used to monitor the experiment operational conditions as well as to assess the quality of the physics data being taken. For example the ATLAS High Level Trigger(HLT) algorithms are executed on the online computing farm consisting from about 1500 nodes. Each HLT algorithm is producing few thousands histograms, which have to be integrated over the whole farm and carefully analyzed in order to properly tune the event rejection. In order to handle such non-physics data the Information Service (IS) facility has been developed in the scope of the ATLAS Trigger and Data Acquisition (TDAQ) project. The IS provides high-performance scalable solution for information exchange in distributed environment. In the course of an ATLAS data taking session the IS handles about hundred gigabytes of information which is being constantly updated with the update interval varying from a second to few tens of seconds. IS provides access to any information item on request as well as distributing notification to all the information subscribers. In latter case IS subscribers receive information within few milliseconds after it was updated. IS can handle arbitrary types of information including histograms produced by the HLT applications and provides C++, Java and Python API. The Information Service is a primarily and in most cases a unique source of information for the majority of the online monitoring analysis and GUI applications, used to control and monitor the ATLAS experiment. Information Service provides streaming functionality allowing efficient replication of all or part of the managed information. This functionality is used to duplicate the subset of the ATLAS monitoring data to the CERN public network with the latency of few milliseconds, allowing efficient real-time monitoring of the data taking from outside the protected ATLAS network. Each information item in IS has an associated URL which can be used to access that item online via HTTP protocol. This functionality is being used by many online monitoring applications which can run in a WEB browser, providing real-time monitoring information about ATLAS experiment over the globe. This paper will describe design and implementation of the IS and present performance results which have been taken in the ATLAS operational environment.