Efficiently answer top-k queries on typed intervals

Abstract Consider a database consisting of a set of tuples, each of which contains an interval, a type and a weight. These tuples are called typed intervals and used to support applications involving diverse intervals. In this paper, we study top- k queries on typed intervals. The query reports k intervals intersecting the query time, containing a particular type and having the largest weight. The query time can be a point or an interval. Further, we define top- k continuous queries that return qualified intervals at each time point during the query interval. To efficiently answer such queries, a key challenge is to build an index structure to manage typed intervals. Employing the standard interval tree, we build the structure in a compact way to reduce the I/O cost, and provide analytically derived partitioning methods to manage the data. Query algorithms are proposed to support point, interval and continuous queries. An auxiliary main-memory structure is developed to report continuous results. Using large real and synthetic datasets, extensive experiments are performed in a prototype database system to demonstrate the effectiveness, efficiency and scalability. The results show that our method significantly outperforms alternative methods in most settings.

[1]  Bongki Moon,et al.  Scalable algorithms for large temporal aggregation , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Michael Böhlen,et al.  TemProRA: Top-k temporal-probabilistic results analysis , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3]  Theodore Johnson,et al.  Selection Predicate Indexing for Active Databases Using Interval Skip Lists , 1996, Inf. Syst..

[4]  Bernhard Seeger,et al.  An Evaluation of Generic Bulk Loading Techniques , 2001, VLDB.

[5]  Michael Stonebraker,et al.  Segment indexes: dynamic indexing techniques for multi-dimensional interval data , 1991, SIGMOD '91.

[6]  Hua Lu,et al.  Indexing and Querying A Large Database of Typed Intervals , 2016, EDBT.

[7]  Christian S. Jensen,et al.  Join operations in temporal databases , 2005, The VLDB Journal.

[8]  Thomas Seidl,et al.  Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases , 2005, VLDB.

[9]  Kevin Skadron,et al.  Binary Interval Search: a scalable algorithm for counting interval intersections , 2013, Bioinform..

[10]  Jeffrey Scott Vitter,et al.  Optimal External Memory Interval Management , 2003, SIAM J. Comput..

[11]  David B. Lomet,et al.  Transaction time indexing with version compression , 2008, Proc. VLDB Endow..

[12]  Christian Böhm,et al.  XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension , 1999, SSD.

[13]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[14]  Feifei Li,et al.  Optimal splitters for temporal and multi-version databases , 2013, SIGMOD '13.

[15]  Thomas Seidl,et al.  Joining interval data in relational databases , 2004, SIGMOD '04.

[16]  Sridhar Ramaswamy,et al.  Indexing for Data Models with Constraints and Classes , 1996, J. Comput. Syst. Sci..

[17]  Chuan-Heng Ang,et al.  The Interval B-Tree , 1995, Inf. Process. Lett..

[18]  Vassilis J. Tsotras,et al.  The Snapshot Index: An I/O-optimal access method for timeslice queries , 1995, Inf. Syst..

[19]  Michael H. Böhlen,et al.  Temporal alignment , 2012, SIGMOD Conference.

[20]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[21]  Peter Triantafillou,et al.  Interval indexing and querying on key-value cloud stores , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[22]  Ralf Hartmut Güting,et al.  External segment trees , 1994, Algorithmica.

[23]  Pankaj K. Agarwal,et al.  An optimal dynamic interval stabbing-max data structure? , 2005, SODA '05.

[24]  Ramez Elmasri,et al.  The Time Index: An Access Structure for Temporal Data , 1990, VLDB.

[25]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[26]  Z. Meral Özsoyoglu,et al.  Indexing Valid Time Intervals , 1998, DEXA.

[27]  Jennifer Widom,et al.  Incremental computation and maintenance of temporal aggregates , 2001, Proceedings 17th International Conference on Data Engineering.

[28]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[29]  Michael H. Böhlen,et al.  Query time scaling of attribute values in interval timestamped databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[30]  Michael H. Böhlen,et al.  Overlap interval partition join , 2014, SIGMOD Conference.

[31]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[32]  Richard T. Snodgrass,et al.  A taxonomy of time databases , 1985, SIGMOD Conference.

[33]  Shan Wang,et al.  INK: A Cloud-Based System for Efficient Top-k Interval Keyword Search , 2014, CIKM.

[34]  Ralf Hartmut Güting,et al.  SECONDO: A Platform for Moving Objects Database Research and for Publishing and Integrating Research Implementations , 2010, IEEE Data Eng. Bull..

[35]  Hans-Peter Kriegel,et al.  Managing Intervals Efficiently in Object-Relational Databases , 2000, VLDB.