Anomaly Detection in Accelerator Facilities Using Machine Learning

Light source facilities usually operate about 5000 hours per year to support multiple beamline operations. Reliability is a key parameter in such user facilities to evaluate machine performance. Some facilities have achieved more than 95% beam reliability. However, there are still many hours of unplanned beam downtime and every hour lost is a waste of operational costs. Beam downtime also interrupts the completion of scheduled user experiments. Preventive maintenance of subsystems and quick recovery from downtimes are the basic strategics to improve reliability. Current recovery incorporates significant human diagnosis efforts. To circumvent this problem of unprecedented downtimes requiring recuperation, we take steps to build solutions that can detect anomalous conditions caused by faulty subsystems. In this paper, we share our findings from an initial assessment of production logs and provide an overview of some potential future directions.