Big Data Science

In ever more disciplines, science is driven by data, which leads to data analytics becoming a primary skill for researchers. This includes the complete process from data acquisition at sensors, over pre-processing and feature extraction to the use and application of machine learning. Sensors here often produce a plethora of data that needs to be dealt with in near-realtime, which requires a combined effort of implementations at the hardware level to high-level design of data flows. In this paper we outline two use-cases of this wide span of data analysis for science in a real-world example in astroparticle physics. We outline a high-level design approach which is capable of defining the complete data flow from sensor hardware to final analysis.

[1]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[2]  K. Morik,et al.  Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling , 2014, PloS one.

[3]  Bernd Bischl,et al.  Faster Model-Based Optimization Through Resource-Aware Scheduling Strategies , 2016, LION.

[4]  Giacomo Indiveri,et al.  Rounding Methods for Neural Networks with Low Resolution Synaptic Weights , 2015, ArXiv.

[5]  W. Lustermann,et al.  FACT -- the First Cherenkov Telescope using a G-APD Camera for TeV Gamma-ray Astronomy (HEAD 2010) , 2010, 1010.2397.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Matthias Bergmann,et al.  FACT-Tools - Processing High-Volume Telescope Data , 2019 .

[8]  J. Ballet,et al.  FERMI LAT AND WMAP OBSERVATIONS OF THE SUPERNOVA REMNANT HB 21 , 2013, 1311.0393.

[9]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[10]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[11]  et al,et al.  Milagrito, a TeV air-shower array , 1999 .

[12]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[13]  Sven Rahmann,et al.  A modular computational framework for automated peak extraction from ion mobility spectra , 2014, BMC Bioinformatics.

[14]  Michael Engel,et al.  A parallelization approach for resource-restricted embedded heterogeneous MPSoCs inspired by OpenMP , 2017, J. Syst. Softw..

[15]  Katharina Morik,et al.  Online Analysis of High-Volume Data Streams in Astroparticle Physics , 2015, ECML/PKDD.

[16]  Andreas Krause,et al.  Budgeted Nonparametric Learning from Data Streams , 2010, ICML.

[17]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[18]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[19]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[20]  David F. Bacon,et al.  FPGA Programming for the Masses , 2013, ACM Queue.

[21]  Scott Hauck,et al.  Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation , 2007 .

[22]  Marco Stolpe,et al.  The Internet of Things: Opportunities and Challenges for Distributed Data Analysis , 2016, SIGKDD Explor..

[23]  Shawki Areibi,et al.  Deep Learning on FPGAs: Past, Present, and Future , 2016, ArXiv.

[24]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[25]  J. Nava,et al.  On the sensitivity of the HAWC observatory to gamma-ray bursts , 2011, 1108.6034.

[26]  Christian Bockermann,et al.  Mining big data streams for multiple concepts , 2015 .

[27]  David B. Kieda,et al.  Status of the VERITAS ground based GeV/TeV Gamma-Ray Observatory , 2004 .

[28]  F. T. Collaboration,et al.  The MAGIC Telescope - prospects for GRB research , 1999, astro-ph/9904178.

[29]  Christian Sohler,et al.  Random projections for Bayesian regression , 2015, Statistics and Computing.

[30]  Jun Rao,et al.  Liquid: Unifying Nearline and Offline Big Data Integration , 2015, CIDR.

[31]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[32]  Kristian Kersting,et al.  Poisson Sum-Product Networks: A Deep Architecture for Tractable Multivariate Poisson Distributions , 2017, AAAI.

[33]  Sangkyun Lee,et al.  Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with Ordered L1-Norm , 2015, AISTATS.

[34]  J. P. Rodrigues,et al.  Development of a general analysis and unfolding scheme and its application to measure the energy spectrum of atmospheric neutrinos with IceCube , 2014, European Physical Journal C: Particles and Fields.

[35]  Katharina Morik,et al.  Integer undirected graphical models for resource-constrained systems , 2016, Neurocomputing.

[36]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.