Incremental Organization for Data Recording and Warehousing

Data warehouses and recording systems typically have a large continuous stream of incoming data, that must be stored in a manner suitable for future access. Access to stored records is usually based on a key. Organizing the data on disk as the data arrives using standard techniques would result in either (a) one or more I/OS to store each incoming record (to keep the data clustered by the key), which is too expensive when data arrival rates are very high, or (b) many I/OS to locate records for a particular customer (if data is stored clustered by arrival order). We study two techniques, inspired by external sorting algorithms, to store data incrementally as it arrives, simultaneously providing good performance for recording and querying. We present concurrency control and recovery schemes for both techniques. We show the benefits of our techniques both analytically and experimentally.