Online Analysis of High-Volume Data Streams in Astroparticle Physics

Experiments in high-energy astroparticle physics produce large amounts of data as continuous high-volume streams. Gaining insights from the observed data poses a number of challenges to data analysis at various steps in the analysis chain of the experiments. Machine learning methods have already cleaved their way selectively at some particular stages of the overall data mangling process. In this paper we investigate the deployment of machine learning methods at various stages of the data analysis chain in a gamma-ray astronomy experiment. Aiming at online and real-time performance, we build up on prominent software libraries and discuss the complete cycle of data processing from raw-data capturing to high-level classification using a data-flow based rapid-prototyping environment. In the context of a gamma-ray experiment, we review user requirements in this interdisciplinary setting and demonstrate the applicability of our approach in a real-world setting to provide results from high-volume data streams in real-time performance.

[1]  Katharina Morik,et al.  Heterogeneous Stream Processing and Crowdsourcing for Urban Traffic Management , 2014, EDBT.

[2]  Petr Savický,et al.  Methods for multidimensional event classification: A case study using images from a Cherenkov gamma-ray telescope , 2004 .

[3]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[4]  W. Hofmann,et al.  Particle identification by multifractal parameters in γ-astronomy with the HEGRA-Cherenkov-telescopes , 2001 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  H. Harney,et al.  Significance in gamma-ray astronomy - the Li & Ma problem in Bayesian statistics , 2004, astro-ph/0411660.

[7]  E. Faleiro,et al.  Discriminant analysis based on spectral statistics applied to TeV cosmic γ/proton separation , 2012 .

[8]  J. Nava,et al.  On the sensitivity of the HAWC observatory to gamma-ray bursts , 2011, 1108.6034.

[9]  M. Gusarova,et al.  Nuclear Instruments and Methods in Physics Research , 2009 .

[10]  Katharina Morik,et al.  Heterogeneous Stream Processing and Crowdsourcing for Traffic Monitoring: Highlights , 2014, ECML/PKDD.

[11]  T. C. Weekes,et al.  Improvement of gamma-hadron discrimination at TeV energies using a new parameter, image Surface Brightness , 1997 .

[12]  Mathieu de Naurois Analysis methods for Atmospheric Cerenkov Telescopes , 2006 .

[13]  J. Ballet,et al.  FERMI LAT AND WMAP OBSERVATIONS OF THE SUPERNOVA REMNANT HB 21 , 2013, 1311.0393.

[14]  R. Bocka,et al.  Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope , 2003 .

[15]  et al,et al.  Milagrito, a TeV air-shower array , 1999 .

[16]  B. W. Carroll,et al.  An Introduction to Modern Astrophysics , 1995 .

[17]  A. Hillas Cerenkov light images of EAS produced by primary gamma , 1985 .

[18]  Stefano Maria Mari,et al.  Gamma–hadron discrimination in extensive air showers using a neural network , 2001 .

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  K. Perez Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment , 2014 .

[21]  W. Lustermann,et al.  FACT -- the First Cherenkov Telescope using a G-APD Camera for TeV Gamma-ray Astronomy (HEAD 2010) , 2010, 1010.2397.

[22]  David B. Kieda,et al.  Status of the VERITAS ground based GeV/TeV Gamma-Ray Observatory , 2004 .

[23]  Doreen Schweizer,et al.  An Introduction To Modern Astrophysics , 2016 .

[24]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[25]  J. Knapp,et al.  CORSIKA: A Monte Carlo code to simulate extensive air showers , 1998 .

[26]  F. T. Collaboration,et al.  The MAGIC Telescope - prospects for GRB research , 1999, astro-ph/9904178.

[27]  Katharina Morik,et al.  Reliable BOF endpoint prediction by novel data-driven modeling , 2014 .