Time travel in a scientific array database

In this paper, we present TimeArr, a new storage manager for an array database. TimeArr supports the creation of a sequence of versions of each stored array and their exploration through two types of time travel operations: selection of a specific version of a (sub)-array and a more general extraction of a (sub)-array history, in the form of a series of (sub)-array versions. TimeArr contributes a combination of array-specific storage techniques to efficiently support these operations. To speed-up array exploration, TimeArr further introduces two additional techniques. The first is the notion of approximate time travel with two types of operations: approximate version selection and approximate history. For these operations, users can tune the degree of approximation tolerable and thus trade-off accuracy and performance in a principled manner. The second is to lazily create short connections, called skip links, between the same (sub)-arrays at different versions with similar data patterns to speed up the selection of a specific version. We implement TimeArr within the SciDB array processing engine and demonstrate its performance through experiments on two real datasets from the astronomy and earth sciences domains.

[1]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[2]  J. Ian Munro,et al.  Deterministic skip lists , 1992, SODA '92.

[3]  Bruce G. Terrell,et al.  National Oceanic and Atmospheric Administration , 2020, Federal Regulatory Guide.

[4]  Magdalena Balazinska,et al.  Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[5]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[6]  Torben Bach Pedersen,et al.  Multidimensional Database Technology , 2001, Computer.

[7]  Kenneth Salem,et al.  Query processing techniques for arrays , 1999, SIGMOD '99.

[8]  Chin-Wan Chung,et al.  Exploiting Versions for On-line Data Warehouse Maintenance in MOLAP Servers , 2002, VLDB.

[9]  Joel H. Saltz,et al.  Titan: a high-performance remote-sensing database , 1997, Proceedings 13th International Conference on Data Engineering.

[10]  Tatsuo Tsuji,et al.  A storage scheme for multidimensional data alleviating dimension dependency , 2008, 2008 Third International Conference on Digital Information Management.

[11]  Martin L. Kersten,et al.  Distribution Rules for Array Database Queries , 2005, DEXA.

[12]  Joseph M. Hellerstein,et al.  Partial results for online query processing , 2002, SIGMOD '02.

[13]  Mohamed F. Mokbel,et al.  Transaction Time Support Inside a Database Engine , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Peter Baumann,et al.  The multidimensional database system RasDaMan , 1998, SIGMOD '98.

[15]  Joel H. Saltz,et al.  T2: a customizable parallel database for multi-dimensional data , 1998, SGMD.

[16]  Phillip M. Fernandez Red brick warehouse: a read-mostly RDBMS for open SMP platforms , 1994, SIGMOD '94.

[17]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[18]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[19]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[20]  Richard T. Snodgrass,et al.  A taxonomy of time databases , 1985, SIGMOD Conference.

[21]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .

[22]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[23]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[24]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[25]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[26]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[27]  Christian S. Jensen,et al.  Temporal Data Management , 1999, IEEE Trans. Knowl. Data Eng..

[28]  Doron Rotem,et al.  Optimal chunking of large multidimensional arrays for data warehousing , 2007, DOLAP '07.

[29]  Michael Stonebraker,et al.  Efficient Versioning for Scientific Array Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[30]  Magdalena Balazinska,et al.  ArrayStore: a storage manager for complex parallel array processing , 2011, SIGMOD '11.

[31]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[32]  Lilian Hobbs,et al.  Rdb, a comprehensive guide , 1999 .