AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images

Astronomy has always been, and will continue to be, a data-based science, and astronomers nowadays are faced with increasingly massive datasets, one key problem of which is to efficiently retrieve the desired cup of data from the ocean. AQUAdexIM, an innovative spatial indexing and querying method, performs highly efficient on-the-fly queries under users’ request to search for Time Series Images from existing observation data on the server side and only return the desired FITS images to users, so users no longer need to download entire datasets to their local machines, which will only become more and more impractical as the data size keeps increasing. Moreover, AQUAdexIM manages to keep a very low storage space overhead and its specially designed in-memory index structure enables it to search for Time Series Images of a given area of the sky 10 times faster than using Redis, a state-of-the-art in-memory database.

[1]  Martin L. Kersten,et al.  Data Vaults: A Symbiosis between Database Technology and Scientific File Repositories , 2012, SSDBM.

[2]  Eduardo Serrano,et al.  LSST: From Science Drivers to Reference Design and Anticipated Data Products , 2008, The Astrophysical Journal.

[3]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[4]  Jizhou Sun,et al.  AQUAdex: A Highly Efficient Indexing and Retrieving Method for Astronomical Big Data of Time Series Images , 2015, ICA3PP.

[5]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[6]  P. Schipani,et al.  The VLT Survey Telescope Opens to the Sky: History of a Commissioning , 2011 .

[7]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[8]  Hilo,et al.  THE ELEVENTH AND TWELFTH DATA RELEASES OF THE SLOAN DIGITAL SKY SURVEY: FINAL DATA FROM SDSS-III , 2015, 1501.00963.

[9]  Arie Shoshani,et al.  Parallel data analysis directly on scientific file formats , 2014, SIGMOD Conference.

[10]  K. Gorski,et al.  HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere , 2004, astro-ph/0409513.

[11]  Marko Vukolic,et al.  DiNoDB: Efficient Large-Scale Raw Data Analytics , 2014, Data4U '14.

[12]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[13]  Jian Xiao,et al.  AstroCloud, a Cyber-Infrastructure for Astronomy Research: Data Archiving and Quality Control , 2014 .

[14]  Michael Stonebraker,et al.  SciDB DBMS Research at M.I.T , 2013, IEEE Data Eng. Bull..

[15]  Jizhou Sun,et al.  Parallel massive data oriented astronomical cross-match: Parallel massive data oriented astronomical cross-match , 2010 .

[16]  Michael Stonebraker,et al.  EarthDB: scalable analysis of MODIS data using SciDB , 2012, BigSpatial '12.

[17]  G. Bruce Berriman,et al.  How Will Astronomy Archives Survive the Data Tsunami? , 2011, ACM Queue.

[18]  Florian Waas Beyond Conventional Data Warehousing - Massively Parallel Data Processing with Greenplum Database - (Invited Talk) , 2008, BIRTE.

[19]  Jorge-Arnulfo Quiané-Ruiz,et al.  Towards zero-overhead static and adaptive indexing in Hadoop , 2013, The VLDB Journal.

[20]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[21]  Danny Boxhoorn,et al.  Sub-image data processing in Astro-WISE , 2013 .

[22]  Michael K. Ng,et al.  Data-mining massive time series astronomical data: challenges, problems and solutions , 1999, Inf. Softw. Technol..

[23]  Doug Tody,et al.  IVOA Recommendation: Simple Image Access Specification Version 1.0 , 2009 .

[24]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[25]  Anastasia Ailamaki,et al.  NoDB in Action: Adaptive Query Processing on Raw Data , 2012, Proc. VLDB Endow..

[26]  Marta Mattoso,et al.  Exploratory Analysis of Raw Data Files through Dataflows , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[27]  Michael Stonebraker,et al.  The Architecture of SciDB , 2011, SSDBM.

[28]  Joel H. Saltz,et al.  Towards building a high performance spatial query system for large scale medical imaging data , 2012, SIGSPATIAL/GIS.

[29]  M. Franx,et al.  THE VLT LEGA-C SPECTROSCOPIC SURVEY: THE PHYSICS OF GALAXIES AT A LOOKBACK TIME OF 7 Gyr , 2016, 1603.05479.