Top-k queries on temporal data

The database community has devoted extensive amount of efforts to indexing and querying temporal data in the past decades. However, insufficient amount of attention has been paid to temporal ranking queries. More precisely, given any time instance t, the query asks for the top-k objects at time t with respect to some score attribute. Some generic indexing structures based on R-trees do support ranking queries on temporal data, but as they are not tailored for such queries, the performance is far from satisfactory. We present the Seb-tree, a simple indexing scheme that supports temporal ranking queries much more efficiently. The Seb-tree answers a top-k query for any time instance t in the optimal number of I/Os in expectation, namely, $${O\left({\rm log}_B\,\frac{N}{B}+\frac{k}{B}\right)}$$ I/Os, where N is the size of the data set and B is the disk block size. The index has near-linear size (for constant and reasonable kmax values, where kmax is the maximum value for the possible values of the query parameter k), can be constructed in near-linear time, and also supports insertions and deletions without affecting its query performance guarantee. Most of all, the Seb-tree is especially appealing in practice due to its simplicity as it uses the B-tree as the only building block. Extensive experiments on a number of large data sets, show that the Seb-tree is more than an order of magnitude faster than the R-tree based indexes for temporal ranking queries.

[1]  Pankaj K. Agarwal,et al.  Geometric Range Searching and Its Relatives , 2007 .

[2]  George Kollios,et al.  Complex Spatio-Temporal Pattern Queries , 2005, VLDB.

[3]  Jeffrey Scott Vitter,et al.  Implementing I/O-efficient Data Structures Using TPIE , 2002, ESA.

[4]  Yufei Tao,et al.  Time-parameterized queries in spatio-temporal databases , 2002, SIGMOD '02.

[5]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[6]  J. Sack,et al.  Handbook of computational geometry , 2000 .

[7]  Beng Chin Ooi,et al.  Query and Update Efficient B+-Tree Based Indexing of Moving Objects , 2004, VLDB.

[8]  Christian S. Jensen,et al.  Transaction Timestamping in (Temporal) Databases , 2001, VLDB.

[9]  Micha Sharir,et al.  Nonlinearity of davenport—Schinzel sequences and of generalized path compression schemes , 1986, FOCS.

[10]  Christian S. Jensen,et al.  Indexing the past, present, and anticipated future positions of moving objects , 2006, TODS.

[11]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[12]  George Kollios,et al.  Mining, indexing, and querying historical spatiotemporal data , 2004, KDD.

[13]  Dieter Pfoser,et al.  Novel Approaches in Query Processing for Moving Object Trajectories , 2000, VLDB 2000.

[14]  Kenneth L. Clarkson,et al.  Applications of random sampling in computational geometry, II , 1988, SCG '88.

[15]  Nikos Pelekis,et al.  Algorithms for Nearest Neighbor Search on Moving Object Trajectories , 2007, GeoInformatica.

[16]  Micha Sharir,et al.  Davenport-Schinzel sequences and their geometric applications , 1995, Handbook of Computational Geometry.

[17]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[18]  Micha Sharir,et al.  Davenport-Schinzel sequences and their geometric applications , 1995, Handbook of Computational Geometry.

[19]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[20]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[21]  Pierre Alliez,et al.  Computational geometry algorithms library , 2008, SIGGRAPH '08.

[22]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[23]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[24]  Charu C. Aggarwal,et al.  On nearest neighbor indexing of nonlinear trajectories , 2003, PODS '03.

[25]  Feifei Li,et al.  Improving Transaction-Time DBMS Performance and Functionality , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Christian S. Jensen,et al.  Lopez: "Indexing the Positions of Continuously Moving Objects , 2000, SIGMOD 2000.

[27]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[28]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[29]  Philip S. Yu,et al.  Global distance-based segmentation of trajectories , 2006, KDD '06.

[30]  Bin Jiang,et al.  Online Interval Skyline Queries on Time Series , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[31]  Reza Sherkat,et al.  On efficiently searching trajectories and archival data for historical similarities , 2008, Proc. VLDB Endow..

[32]  Dimitrios Gunopulos,et al.  Indexing spatiotemporal archives , 2006, The VLDB Journal.

[33]  Mohamed F. Mokbel,et al.  Immortal DB: transaction time support for SQL server , 2005, SIGMOD '05.

[34]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[35]  Yufei Tao,et al.  MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries , 2001, VLDB.

[36]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[37]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[38]  Nick Roussopoulos,et al.  SEB-tree: An Approach to Index Continuously Moving Objects , 2003, Mobile Data Management.

[39]  Timothy M. Chan Random Sampling, Halfspace Range Reporting, and Construction of (<= k)-Levels in Three Dimensions , 2000, SIAM J. Comput..

[40]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[41]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[42]  John Hershberger,et al.  Finding the Upper Envelope of n Line Segments in O(n log n) Time , 1989, Inf. Process. Lett..

[43]  Sha-Mayn Teh,et al.  I/O-efficient point location using persistent B-trees , 2003, ALENEX.

[44]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.