Efficient Management of Short-Lived Data

Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.

[1]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[2]  Christian S. Jensen,et al.  Expiration Times for Data Management , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[4]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[5]  Alan R. Simon,et al.  Understanding the New SQL: A Complete Guide , 1993 .

[6]  Cecilia R. Aragon,et al.  Randomized search trees , 2005, Algorithmica.

[7]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[8]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[9]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[10]  Michael J. Carey,et al.  Query processing in main memory database management systems , 1986, SIGMOD '86.

[11]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[12]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Richard T. Snodgrass,et al.  Developing Time-Oriented Database Applications in SQL , 1999 .

[14]  Jennifer Widom,et al.  CQL: A Language for Continuous Queries over Streams and Relations , 2003, DBPL.

[15]  Christian S. Jensen,et al.  Transaction Timestamping in (Temporal) Databases , 2001, VLDB.

[16]  Chris Okasaki,et al.  Purely functional data structures , 1998 .

[17]  Bernhard Seeger,et al.  Reading a Set of Disk Pages , 1993, VLDB.

[18]  William D. Clinger,et al.  Generational garbage collection and the radioactive decay model , 1997, PLDI '97.

[19]  Amer Diwan,et al.  Memory system performance of programs with intensive heap allocation , 1995, TOCS.

[20]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[21]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[22]  David Toman,et al.  Logical Data Expiration for Fixpoint Extensions of Temporal Logics , 2003, SSTD.

[23]  Christos Faloutsos,et al.  Capturing the spatio-temporal behavior of real traffic data , 2002, Perform. Evaluation.

[24]  Sudipto Guha,et al.  Approximating a data stream for querying and estimation: algorithms and performance evaluation , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .

[26]  Robert E. Tarjan,et al.  Making data structures persistent , 1986, STOC '86.

[27]  Marc Shapiro,et al.  A Survey of Distributed Garbage Collection Techniques , 1995, IWMM.

[28]  Christian S. Jensen,et al.  A foundation for vacuuming temporal databases , 2003, Data Knowl. Eng..

[29]  FaloutsosChristos,et al.  Capturing the spatio-temporal behavior of real traffic data , 2002 .

[30]  Chris Okasaki,et al.  Functional Data Structures , 1996, Handbook of Data Structures and Applications.

[31]  Conrado Martínez,et al.  Randomized binary search trees , 1998, JACM.

[32]  James Allen Fill,et al.  On the distribution of binary search trees under the random permutation model , 1996, Random Struct. Algorithms.

[33]  David B. Lomet,et al.  The performance of a multiversion access method , 1990, SIGMOD '90.

[34]  David B. Lomet,et al.  Access methods for multiversion data , 1989, SIGMOD '89.

[35]  Christian S. Jensen,et al.  Indexing of moving objects for location-based services , 2002, Proceedings 18th International Conference on Data Engineering.

[36]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[37]  J. A. Fill On the distribution of binary search trees under the random permutation model , 1996, Random Struct. Algorithms.