论文信息 - Anomaly detection using Deep Autoencoders for the assessment of the quality of the data acquired by the CMS experiment

Anomaly detection using Deep Autoencoders for the assessment of the quality of the data acquired by the CMS experiment

The certification of the CMS experiment data as usable for physics analysis is a crucial task to ensure the quality of all physics results published by the collaboration. Currently, the certification conducted by human experts is labor intensive and based on the scrutiny of distributions integrated on several hours of data taking. This contribution focuses on the design and prototype of an automated certification system assessing data quality on a per-luminosity section (i.e. 23 seconds of data taking) basis. Anomalies caused by detector malfunctioning or sub-optimal reconstruction are difficult to enumerate a priori and occur rarely, making it difficult to use classical supervised classification methods such as feedforward neural networks. We base our prototype on a semi-supervised approach which employs deep autoencoders. This approach has been qualified successfully on CMS data collected during the 2016 LHC run: we demonstrate its ability to detect anomalies with high accuracy and low false positive rate, when compared against the outcome of the manual certification by experts. A key advantage of this approach over other machine learning technologies is the great interpretability of the results, which can be further used to ascribe the origin of the problems in the data to a specific sub-detector or physics objects.

[1] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[3] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Valdas Rapsevicius. CMS Run Registry: Data Certification Bookkeeping and Publication System , 2011 .

[5] Marc'Aurelio Ranzato,et al. Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[6] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[7] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.