This paper presents a new distributed computing framework for Many Task Computing (MTC) applications, based on the Extensible Messaging and Presence Protocol (XMPP). A lightweight, highly available system, named Kestrel, has been developed to explore XMPP-based techniques for improving MTC system tolerance to faults that result from scaling and intermittent computing agent presence. By leveraging technologies used in large instant messaging systems that scale to millions of clients, this MTC system is designed to scale to millions of agents at various levels of granularity: cores, machines, clusters, and even sensors, which makes it a good fit for MTC.
Kestrel's architecture is inspired by the distributed design of pilot job frameworks on the grid as well as botnets, with the addition of a commodity instant messaging protocol for communications. Whereas botnet command-and-control systems have frequently used a combination of Internet Relay Chat (IRC), Distributed Hash Table (DHT), and other Peer-to-Peer (P2P) technologies, Kestrel utilizes XMPP for its presence notification capabilities, which allow the system to maintain continuous tracking of machine presence and state in real time. XMPP is also easily extensible with application-specific subprotocols, which can be utilized to transfer machine profile descriptions and job requirements. These sub-protocols can be used to implement distributed matching of jobs to systems, using a mechanism similar to ClassAds in the Condor High Throughput Computing (HTC) system.
[1]
Douglas Thain,et al.
How to measure a large open‐source distributed system
,
2006,
Concurr. Comput. Pract. Exp..
[2]
Gerhard Weis,et al.
Using XMPP for ad-hoc grid computing - an application example using parallel ant colony optimisation
,
2009,
2009 IEEE International Symposium on Parallel & Distributed Processing.
[3]
Peter Saint-Andre,et al.
Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence
,
2004,
RFC.
[4]
Miron Livny,et al.
Condor: a distributed job scheduler
,
2001
.
[5]
Yong Zhao,et al.
Many-task computing for grids and supercomputers
,
2008,
2008 Workshop on Many-Task Computing on Grids and Supercomputers.
[6]
Peter Saint-Andre.
Extensible Messaging and Presence Protocol (XMPP): Core
,
2011,
RFC.
[7]
Ping Wang,et al.
An Advanced Hybrid Peer-to-Peer Botnet
,
2007,
IEEE Transactions on Dependable and Secure Computing.
[8]
Sebastien Goasguen,et al.
Dynamic Provisioning of Virtual Organization Clusters
,
2009,
2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.