Transaction management in data migration systems

Data migration is a new paradigm for distributed computing. Processes executing in a data migration system (DMS) need not be aware of the location of the distributed data they access. When a process executing in site s accesses a remote data element d, the system moves d from its current location to s for computation. This thesis addresses the problems of concurrency control and crash recovery in data migration systems. The first contribution of the thesis is a study of the interaction between process-level and memory-level synchronization in a DMS. Most existing studies of DMS deal with these two levels of synchronization requirement in isolation. We show that by integrating process-level and memory-level synchronization, the system can achieve significantly better performance. In choosing an appropriate synchronization primitive for DMS, we advocate the use of the transaction model. The transaction model simplifies the task of programming in DMS while providing flexibility for implementation-level optimization. Another contribution of the thesis is the concurrency control and recovery algorithms. We study a token/lock-based algorithm and an optimistic algorithm. Our token-based algorithm is novel in its treatment of the dynamic data locating information that is vital to data migration. We devise techniques that enable non-blocking and atomic token transfers between sites. Our protocol incurs low overhead and achieves fast recovery after system failure. Finally, we argue that the optimistic method requires less frequent network interaction than the token-based approach and outperforms the latter in many circumstances. We study several optimistic strategies suitable for different system configurations.