2016 Ieee International Conference on Big Data (big Data) Running Scientific Algorithms as Array Database Operators: Bringing the Processing Power to the Data

Array databases are well suited for storing and processing large multidimensional data. However, they usually run rather simple operations which only represent single steps of scientific algorithms. A way to run more complex logic is needed. In this paper, we study and test how to run entire scientific algorithms as native array database operators in a SciDB cluster. We present as use case our implementation of an iterative algorithm that reconstructs the distribution of plasma density in the solar corona at specific temperatures. This algorithm uses images series of the NASA spacecraft Solar Dynamic Observatory (SDO), which we stack in a 3-dimensional array. We measure for our use case a decrease of the overall runtime by an order of magnitude. We discuss different parameters used to scale up the array database cluster.

[1]  Vanish Talwar,et al.  GPU Accelerated Array Queries: The Good, the Bad, and the Promising , 2014 .

[2]  Magdalena Balazinska,et al.  ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[3]  Wellington Cabrera,et al.  Accelerating a Gibbs sampler for variable selection on genomics data with summarization and variable pre-selection combining an array DBMS and R , 2016, Machine Learning.

[4]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[5]  Magdalena Balazinska,et al.  A Demonstration of Iterative Parallel Array Processing in Support of Telescope Image Analysis , 2013, Proc. VLDB Endow..

[6]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[7]  Aniruddha R. Thakar,et al.  Designing for Peta-Scale in the LSST Database , 2007 .

[8]  P. Martens,et al.  FAST DIFFERENTIAL EMISSION MEASURE INVERSION OF SOLAR CORONAL DATA , 2012, 1204.6306.

[9]  E. Kontar,et al.  Differential Emission Measures from the Regularized Inversion of Hinode and SDO data , 2012, 1201.2642.

[10]  C. J. Wolfson,et al.  The Atmospheric Imaging Assembly (AIA) on the Solar Dynamics Observatory (SDO) , 2011 .

[11]  Magdalena Balazinska,et al.  Efficient iterative processing in the SciDB parallel array engine , 2015, SSDBM.

[12]  Dmitry Medvedev,et al.  SciServer Compute: Bringing Analysis Close to the Data , 2016, SSDBM.

[13]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.