Disk Management for Object-Oriented Databases (Student Paper)

An object-oriented database provides persistent storage for a large number of objects. These objects may be very small, and the access patterns are likely to be not as uniform as the mostly sequential reads and writes seen in le-systems 1]. For example, the OO7 benchmark for object-oriented databases speciies a number of traversals that follow pointers around a graph of objects 2]. Given these diierences between le-systems and object-oriented databases, disk management techniques used in le-systems will not perform well if naively applied to object-oriented databases. In this paper I propose three disk management strategies for object-oriented databases. These strategies are based on earlier work on le-systems. They diier from this earlier work in their support for a large number of small objects and non-sequential access patterns. 1 Background Object-oriented databases are usually built around a transaction system because most meaningful database operations require reading and writing multiple objects. The proposed strategies exploit the feature of trans-actional systems where modiications do not have to be made persistent until transaction-commit. Techniques similar to the ones proposed here could be used in a system without transactions if modiications do not have to made persistent at once, but can be buuered in memory for short periods of time. (The thirty second write delay in some le-systems is an example.) The rest of this section contains a brief description of techniques used in various disk-based systems that have been adapted to t into the proposed disk management strategies. 1.1 Caching Many systems cache data in volatile memory to reduce the number of disk accesses required to satisfy read requests. Caching can vastly improve read performance. However, for reliability reasons, modiications have to be made persistent at transaction commit and therefore caching does not directly aaect write performance. Disks tend to perform rather poorly if the unit of transfer between disk and memory is small. Much higher disk throughput can be achieved by reading and writing large amounts of data in one disk operation. Therefore if certain objects are likely to be read together, these objects can be stored contiguously on disk and can be read into the cache in one disk operation. This technique can signiicantly improve the performance of a disk-bound system, but will result in wasted work if reference patterns change so as to not match the layout of objects on disk.