CrossTree: A new HTC architecture with high reliability and scalability

HTC (high throughput computing) is a environment that can provide large amounts of processing capacity over long periods of time. To HTC, users are more concerned about how many jobs can be completed in a long period, but not how fast can a single job be finished. Condor, an implementation of HTC, is constructed by commodity CPUs and memory. As long as the Condor nodes are controlled by the Central Management Node, its reliability and scalability had been restricted. Based on the concept of DHT (distributed hash table), this paper presents a new distributed HTC architecture, named CrossTree, which has no central parts, and its metadata is distributed across all nodes in the system. Theoretical analysis and the simulation results proved CrossTree to be an efficient architecture with high scalability and reliability.

[1]  Hector Garcia-Molina,et al.  Open Problems in Data-Sharing Peer-to-Peer Systems , 2003, ICDT.

[2]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[3]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[6]  Francine Berman,et al.  Models and scheduling mechanisms for global computing applications , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[7]  Walter A. Burkhard,et al.  Reliability and performance of RAIDs , 1995 .

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.