Advanced query processing in databases

As random access memory gets cheaper, it becomes increasingly affordable to build computers with large main memories. However, main memory data processing is not as simple as increasing the buffer pool size. An important issue is the cache behavior. Cache optimization differs from buffer optimization in a disk-based system. Indexing structures can reduce overall computation time without using too much space. In this thesis, I studied the cache behavior of several existing indexing structures in main memory. I designed and implemented two kinds of cache sensitive indexing structures, namely the CSS-Trees and the CSB+-Trees. Using pointer elimination techniques, the number of cache misses is reduced significantly in both indexing structures. As a result, searching these cache conscious indexing structures is much faster than existing tree-based indexes. Unlike CSS-Trees, CSB+-Trees support incremental updates and are useful to a larger number of applications. In this thesis, I will also describe the Columbia Main Memory Database System. The goal of the system is to prototype a main memory based decision support system that provides fast query processing by improving the cache behavior. The system uses cache conscious data layout to store tables in main memory, which increases the spatial locality during table scans. It also employs cache conscious data processing algorithms. In particular, it uses cache sensitive indexing structures to perform indexed nested loop joins. The second part of this thesis focuses on complex query processing and optimization. Complex queries are common in decision support systems. As part of this thesis, I have designed and implemented a new “invariant” technique that can evaluate arbitrary correlated queries efficiently. The technique enhances the execution engine so that it recognizes the part of the subquery that is uncorrelated and tries to cache and reuse the invariant result. The method also teaches a conventional query optimizer to understand the invariant feature and thus allows it to generate better plans. The technique has been incorporated into a commercial release of Sybase IQ. (Abstract shortened by UMI.)

[1]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[2]  Edmund Ihler,et al.  Bounds on the quality of approximate solutions to the Group Steiner Problem , 1990, WG.

[3]  Sudipto Guha,et al.  Rounding via Trees : Deterministic Approximation Algorithms forGroup , 1998 .

[4]  Gabriele Reich,et al.  Beyond Steiner's Problem: A VLSI Oriented Generalization , 1989, WG.

[5]  Alex Zelikovsky,et al.  Provably good routing tree construction with multi-port terminals , 1997, ISPD '97.

[6]  Sudipto Guha,et al.  Approximation algorithms for directed Steiner problems , 1999, SODA '98.

[7]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Alex Zelikovsky,et al.  Improved approximation bounds for the group Steiner problem , 1998, Proceedings Design, Automation and Test in Europe.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Divyakant Agrawal,et al.  Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents , 2002, IEEE Trans. Knowl. Data Eng..

[11]  R. Ravi,et al.  A polylogarithmic approximation algorithm for the group Steiner tree problem , 2000, SODA '98.

[12]  Aravind Srinivasan,et al.  New approaches to covering and packing problems , 2001, SODA '01.

[13]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[14]  Edmund Ihler,et al.  The Complexity of Approximating the Class Steiner Tree Problem , 1991, WG.