Data placement techniques for multimedia hierarchical storage systems

Due to the information explosion we are witnessing, a growing number of applications store, maintain, and retrieve large volume of data, where the data is required to be available online or near-online. These data repositories are implemented using hierarchical storage structures (HSS). One of the components of HSS is tertiary storage, which provides a cost-effective storage for the vast amount of data manipulated by these applications. However, it is crucial that the 3–4 orders of magnitude difference in access time between the tertiary storage and the secondary storage be bridged to allow online or near-online access to the tertiary resident data. This wide access gap is mainly due to: the sequential nature of the most popular tertiary technologies (i.e., tapes) and the low number of drives per media in tertiary storage juke boxes. In my dissertation, I propose a novel data placement technique specifically designed for the serpentine tape technology, namely: Wrap ARound data Placement ( WARP). I focus on tape technology because tapes provide the most cost-effective storage for very large databases, and more specifically on serpentine tapes because they are increasingly the technology of choice for mid-range and high-end systems, i.e., more than 60% of the total tape storage market share. WARP may reduce the access time by 1-order of magnitude, depending on the specific tape device specifications and object sizes. An important feature of WARP is that it optimizes access time independently of the retrieval order. This is achieved by exploiting the serpentine tape technology characteristics as opposed to the application characteristics. I have implemented WARP and other traditional data placement techniques on an IBM-3590 tape drive and have observed up to 1-order of magnitude improvement in reposition-time as compared to other data placement techniques. Moreover, with WARP, the variance between the best, the worst, and the average access times is very small, allowing for a priori prediction of the access time behavior by time sensitive applications (e.g., real-time). As part of my study, I have also considered the following issues: (1) multiple load/unload positions, (2) multiple partitions, (3) multiplexing, (4) indexing for small objects, and (5) cost-analysis.