Granularity reduction in temporal document databases

With rapidly decreasing storage costs, temporal document databases are now a viable solution in many contexts. However, storing an ever-growing database can still be too costly, and as a consequence it is desirable to be able to physically delete old versions of data. Traditionally, this has been performed by an operation called vacuuming, where the oldest versions are physically deleted or migrated from secondary storage to less costly tertiary storage. In temporal document databases on the other hand, it is often more appropriate to remove intermediate versions instead of removing the oldest versions. We call this operation granularity reduction. In this paper we describe the concept of granularity reduction, and present six strategies for selecting the document versions to eliminate. Three of the strategies have been implemented in the V2 temporal document database system, and in this context we discuss the cost of applying the strategies.

[1]  Richard T. Snodgrass,et al.  The TSQL2 Temporal Query Language , 1995 .

[2]  Christian S. Jensen,et al.  A foundation for vacuuming temporal databases , 2003, Data Knowl. Eng..

[3]  Xmldm,et al.  XML-Based Data Management and Multimedia Engineering — EDBT 2002 Workshops , 2002, Lecture Notes in Computer Science.

[4]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Kjetil Nørvåg,et al.  Algorithms for Temporal Query Operators in XML Databases , 2002, EDBT Workshops.

[7]  Torben Bach Pedersen,et al.  Specification-based data reduction in dimensional data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Walter F. Tichy,et al.  Rcs — a system for version control , 1985, Softw. Pract. Exp..

[9]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[11]  Keishi Tajima,et al.  Archiving scientific data , 2002, SIGMOD '02.

[12]  David Toman Expiration of historical databases , 2001, Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001.

[13]  Fusheng Wang,et al.  Temporal queries in XML document archives and web warehouses , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[14]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[15]  Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference, June 6-11, 1999, Monterey, California, USA , 1999, USENIX Annual Technical Conference, FREENIX Track.

[16]  Carlo Zaniolo,et al.  Copy-based versus edit-based version management schemes for structured documents , 2001, Proceedings Eleventh International Workshop on Research Issues in Data Engineering. Document Management for Data Intensive Business and Scientific Applications. RIDE 2001.

[17]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  Kjetil Nørvåg Supporting temporal text-containment queries in temporal document databases , 2004, Data Knowl. Eng..

[20]  Hector Garcia-Molina,et al.  Expiring Data in a Warehouse , 1998, VLDB.

[21]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  Kjetil Nørvåg The design, implementation, and performance of the V2 temporal document database system , 2004, Inf. Softw. Technol..