High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations

There has been a massive increase in computing requirements for parallel applications. These parallel applications and supporting cluster services often need to share system-wide resources. The coordination of these applications is typically managed by a distributed lock manager. The performance of the lock manager is extremely critical for application performance. Researchers have shown that the use of two sided communication protocols, like TCP/IP (used by current generation lock managers), can have significant impact on the scalability of distributed lock managers. In addition, existing one sided communication based locking designs support locking in exclusive access mode only and can pose significant scalability limitations on applications that need both shared and exclusive access modes like cooperative/file-system caching. Hence the utility of these existing designs in high performance scenarios can be limited. In this paper, we present a novel protocol, for distributed locking services, utilizing the advanced network-level one-sided atomic operations provided by InfiniBand. Our approach augments existing approaches by eliminating the need for two sided communication protocols in the critical locking path. Further, we also demonstrate that our approach provides significantly higher performance in scenarios needing both shared and exclusive mode access to resources. Our experimental results show 39% improvement in basic locking latencies over traditional send/receive based implementations. Further, we also observe a significant (up to 317% for 16 nodes) improvement over existing RDMA based distributed queuing schemes for shared mode locking scenarios.

[1]  Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 14-17 May 2007, Rio de Janeiro, Brazil , 2007, CCGRID.

[2]  Ilya Gertner,et al.  A distributed lock manager on fault tolerant MPP , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[3]  Gabriel Antoniu,et al.  Making a DSM consistency protocol hierarchy-aware: an efficient synchronization scheme , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[4]  Amith R. Mamidala,et al.  Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  Pete Wyckoff,et al.  Distributed queue-based locking using advanced network features , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[6]  Eike Born,et al.  Analytical performance modelling of lock management in distributed systems , 1996, Distributed Syst. Eng..

[7]  H. Kishida,et al.  SSDLM: architecture of a distributed lock manager with high degree of locality for clustered file systems , 2003, 2003 IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM 2003) (Cat. No.03CH37490).

[8]  Frank Mueller,et al.  A log(n) multi-mode locking protocol for distributed systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.