dsmDB: Clustering in-memory Database Management Systems

Database management systems used in practice inherit their design from the early work of the database community. At the time, systems had limited memory and processing resources, and architectures were mainly centralized. Databases are often the “bottleneck” of performance-critical systems because of their heavy use of stable storage and mechanisms that allow concurrent transactions to be correctly executed. In this thesis we investigate the dsmDB approach for clustering in-memory databases. The dsmDB is designed for distributing database computation and storage over a cluster of machines. Performance is enhanced by emphasizing in-memory computation and minimizing disk use. This is achieved by using an optimistic concurrency control mechanism on top of an in-memory storage layer that guarantees only weak consistency. By combining the two components we achieve both high performance and strong consistency. The resulting architecture is also flexible enough to allow recovery of the state of crashed nodes from the state of the alive nodes, and incremental expansion by adding more nodes at runtime.