Distributed In-Memory Analytics for Big Temporal Data

The temporal data is ubiquitous, and massive amount of temporal data is generated nowadays. Management of big temporal data is important yet challenging. Processing big temporal data using a distributed system is a desired choice. However, existing distributed systems/methods either cannot support native queries, or are disk-based solutions, which could not well satisfy the requirements of high throughput and low latency. To alleviate this issue, this paper proposes an In-memory based Two-level Index Solution in Spark (ITISS) for processing big temporal data. The framework of our system is easy to understand and implement, but without loss of efficiency. We conduct extensive experiments to verify the performance of our solution. Experimental results based on both real and synthetic datasets consistently demonstrate that our solution is efficient and competitive.

[1]  Feifei Li,et al.  Optimal splitters for temporal and multi-version databases , 2013, SIGMOD '13.

[2]  Norman May,et al.  Bi-temporal Timeline Index: A data structure for Processing Queries on bi-temporal data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[3]  Ramez Elmasri,et al.  The Time Index: An Access Structure for Temporal Data , 1990, VLDB.

[4]  Sridhar Ramaswamy Efficient Indexing for Constraint and Temporal Databases , 1997, ICDT.

[5]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[6]  Kai Cheng On Computing Temporal Aggregates over Null Time Intervals , 2017, DEXA.

[7]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Michelangelo Ceci,et al.  A Temporal Data Mining Framework for Analyzing Longitudinal Data , 2011, DEXA.

[9]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[10]  Badrish Chandramouli,et al.  Temporal Analytics on Big Data for Web Advertising , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[11]  Feifei Li,et al.  Top-k queries on temporal data , 2010, The VLDB Journal.

[12]  Christian S. Jensen,et al.  Join operations in temporal databases , 2005, The VLDB Journal.

[13]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2003, The VLDB Journal.

[14]  Richard R. Muntz,et al.  Temporal Query Processing and Optimization in Multiprocessor Database Machines , 1992, VLDB.

[15]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[16]  Christian S. Jensen,et al.  Multi-dimensional Aggregation for Temporal Data , 2006, EDBT.

[17]  Yin Yang,et al.  OceanRT: real-time analytics over large temporal data , 2014, SIGMOD Conference.

[18]  Yun Yang,et al.  Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[20]  Christian S. Jensen,et al.  Temporal Data Management , 1999, IEEE Trans. Knowl. Data Eng..

[21]  Mohamed F. Mokbel,et al.  Transaction Time Support Inside a Database Engine , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Elisa Bertino,et al.  Semantic assumptions and query evaluation in temporal databases , 1995, SIGMOD '95.

[23]  Sushil Jajodia,et al.  Temporal modules , 1993, Inf. Sci..

[24]  Dimitrios Gunopulos,et al.  On computing temporal aggregates with range predicates , 2008, TODS.

[25]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[26]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Wolfgang Lehner,et al.  SAP HANA database: data management for modern business applications , 2012, SGMD.

[28]  Thomas Seidl,et al.  Tracing Evolving Subspace Clusters in Temporal Climate Data , 2011, Data Mining and Knowledge Discovery.

[29]  Norman May,et al.  Timeline index: a unified data structure for processing queries on temporal data in SAP HANA , 2013, SIGMOD '13.

[30]  Sreenivas Gollapudi,et al.  Framework and algorithms for trend analysis in massive temporal data sets , 2004, CIKM '04.

[31]  Christian S. Jensen,et al.  R-Tree Based Indexing of Now-Relative Bitemporal Data , 1998, VLDB.

[32]  Zhao Li,et al.  Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data , 2016, Data Mining and Knowledge Discovery.

[33]  Richard T. Snodgrass,et al.  Performance evaluation of a temporal database management system , 1986, SIGMOD '86.

[34]  Richard T. Snodgrass,et al.  Parallel algorithms for computing temporal aggregates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[35]  George Kollios,et al.  Hashing Methods for Temporal Data , 2002, IEEE Trans. Knowl. Data Eng..